0% found this document useful (0 votes)
7 views

3 - Combining Self-Supervised Learning and Yolo v4 Network for Construction Vehicle Detection

Uploaded by

Edu Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

3 - Combining Self-Supervised Learning and Yolo v4 Network for Construction Vehicle Detection

Uploaded by

Edu Miranda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Hindawi

Mobile Information Systems


Volume 2022, Article ID 9056415, 10 pages
https://doi.org/10.1155/2022/9056415

Research Article
Combining Self-Supervised Learning and Yolo v4 Network for
Construction Vehicle Detection

Ying Zhang , Xuyang Hou , and Xuhang Hou


School of Electrical and Control Engineering, Shenyang Jianzhu University, Shenyang 110000, China

Correspondence should be addressed to Xuyang Hou; [email protected]

Received 18 May 2022; Revised 9 August 2022; Accepted 27 August 2022; Published 20 September 2022

Academic Editor: Salvatore Carta

Copyright © 2022 Ying Zhang et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

At present, there are many application fields of target detection, but it is very difficult to apply intelligent traffic target detection in
the construction site because of the complex environment and many kinds of engineering vehicles. A method based on self-
supervised learning combined with the Yolo (you only look once) v4 network defined as “SSL-Yolo v4” (self-supervised learning-
Yolo v4) is proposed for the detection of construction vehicles. Based on the combination of self-supervised learning network and
Yolo v4 algorithm network, a self-supervised learning method based on context rotation is introduced. By using this method, the
problem that a large number of manual data annotations are needed in the training of existing deep learning algorithms is solved.
Furthermore, the self-supervised learning network after training is combined with Yolo v4 network to improve the prediction
ability, robustness, and detection accuracy of the model. The performance of the proposed model is optimized by performing five-
fold cross validation on the self-built dataset, and the effectiveness of the algorithm is verified. The simulation results show that the
average detection accuracy of the SSL-Yolo v4 method combined with self-supervised learning is 92.91%, 4.83% detection speed is
improved, 7–8 fps detection speed is improved, and 8–9% recall rate is improved. The results show that the method has higher
precision and speed and improves the ability of target prediction and the robustness of engineering vehicle detection.

1. Introduction virtual detection line method meets the requirements of


vehicle supervision at large sites [8]. Aiming at the vehicle
The technology of target detection has made great detection method based on sensors, this method is simple
achievements in many fields. For example, target detec- to operate and does not need complex procedures, but the
tion techniques are used in the field of medicine for cell environment adaptability is poor [9]. The combination of
identification and segmentation [1]. In the field of HOG features and support vector machine provides a new
manufacturing, the detection network is used to deter- idea for construction vehicle identification: first, the
mine whether the target is defective [2]. In the field of extracted image is preprocessed, and then the target area is
traffic, target detection technology is used to identify extracted according to the shape, color, and other char-
license plates for integrated traffic control and to identify acteristics of the construction vehicle, which reduces the
autonomous driving targets in bad weather or at night target detection range effectively [10]. The CNN (con-
[3–6]. Vehicle detection is one of the applications of the volutional neural network) is improved and applied to the
computer vision technology. At present, the research on intelligent monitoring of the detection of intrusion en-
construction vehicle detection methods can be roughly gineering vehicles, but there are problems such as in-
divided into two categories: the traditional image pro- stallation difficulties, serious occlusion, and low large area
cessing machine learning method and the deep learning inspection efficiency [11–13]. The researchers came up
based method [7]. In the traditional methods, the vehicle with the method of combining depth learning features and
speed, vehicle color, number, and other information are edge feature and proposed the FCOS algorithm, which has
mostly used for detection. For example, the visual-based good tracking effect but not good classification effect [14].
2 Mobile Information Systems

At present, the popular deep learning algorithm is im- supervised learning belongs to a set of unsupervised learning
proved by the Yolo algorithm. Yolo is a real-time object [25]. In self-supervised learning, auxiliary supervised tasks
detection system based on the CNN proposed in 2015. It has are set by entering certain properties of the data to achieve
been widely used in medical, industrial, production, and the training purpose, without manually marking the data.
other aspects. In recent years, to improve the detection effect For example, divide the picture into different sizes, restore
of the convolutional network, researchers have continuously the picture, extract the main features of the picture, and
improved the residual network structure, deepening the predict the location of the picture.
network layer number, and other operations [15–17]. For
example, the improved Yolo v3 detection algorithm uses
context features for fusion and multiscale training, which 2.2. Yolo v4 Algorithm. Yolo (you only look once) is a real-
greatly improves the detection accuracy [18]. A method by time object detection system based on the CNN, in 2015
using freely acquired multimodal content for training [26]. The Yolo algorithm treats object detection as a re-
computer vision algorithms was proposed by Lanaro et al. gression problem and predicts bounding box coordinates
[19]. Through the idea of self-supervised learning of visual and class probabilities directly from the full image. In
features, to mine the large-scale multimodal (text and image) recent years, the Yolo v2 algorithm has improved in the
document corpus, using the text corpus found in the hidden prediction accuracy, identifying more objects, and speed
semantic structure and a topic modeling technology [27]. The Yolo v3 algorithm has changed the size of the
(TextTopicNet) to do the multimodal [20, 21]. Wu et al. model structure to measure the speed and accuracy of
promote self-supervised learning through knowledge detection, and improved the detection range through
transfer, proposing to reduce pseudo-label transfer knowl- multiple downsampling layers, and then improved the
edge on unlabeled datasets [22]. detection accuracy. The Yolo v4 algorithm through the
In life, due to the limitation of actual conditions, the CSPDarknet53 network features extract the image in S × S
open datasets of construction vehicles are often small grid, target detection through the target center in the grid,
samples, and the accuracy of supervised training on the basis using the residual network sampling and sampling fea-
of deep learning is not high enough because the types and tures, the maximum pooling of different scales after
number of samples collected by the datasets are small, and stacking, finally after the size of the target category and
the feature extraction process cannot be effectively trained. position [28].
While the supervised training will be affected by other Mish � x ∗ tanh ln 1 + ex 􏼁􏼁. (1)
factors, the manual labeling label is missing, errors, and
other situations, and the labeling process is also very diffi- Yolo v4 network front-end innovation introduces mo-
cult. Due to the complexity of the construction environment, saic data enhancement, SAT (self-adversarial training), its
there is still a problem of poor small object detection ac- backbone network is CSP Darknet53 network, and adopts
curacy by using the deep learning algorithm. Mainly because Mish activation function. The anchor frame mechanism of
the pixel will change after multiple convolution training, the the output layer of the Yolo v4 algorithm is the same as the
coefficient will appear with the improvement of convolution Yolo v3, and the main improvement is the loss function
accuracy, which will affect the detection process. To solve the during the training [29]. The loss function of Yolo v3
above problems, the design training process collects the consists of frame loss, confidence loss, and category loss, and
corresponding dataset by itself and introduces the context- the Yolo v4 algorithm innovates in the surrounding frame
based self-supervised learning method, and the self-super- loss. As there will be an overlap in the detection process, the
vised network through the auxiliary task training, combined frame loss mode adopts CIOU, mainly considering three
with the later deep learning algorithm. While ensuring the factors: aspect ratio, overlapping area, and distance to the
pixels of the datasets, 3∼4 times data enhancement can central point.
improve the model robustness.
L � LCIOU + LCONF + LCLASS ,
2. Correlation Methods (2)
d2
2.1. Self-Supervised Learning. Supervised learning requires a LCIOU � 1 − IOU(A, B) + a ∗ v + 2 .
large number of manual operations during the generation of c
manual labeling and labels and a large number of data In the type, IOU is the union ratio between the pre-
samples in the training of deep learning [23]. Label labeling diction box and the real value, a is the weight coefficient, v is
of a large number of samples is still a bottleneck for su- the similarity ratio of length to width, d is Euclidean distance
pervised, as the amount of training data is crucial in data- between the center point of the prediction box and the real
driven models [24]. To reduce the burden of data collection, box, c is the diagonal distance between the minimum closure
unsupervised or semisupervised learning strategies can be region of the prediction box and the real box. wgt and hgt are
adopted. Unsupervised learning only does not require the width and height of the real box, w and h predicted the
manual intervention and operational training, and self- width and height.
Mobile Information Systems 3

2
4 wgt w
v� 2
􏼠arctan gt − arctan 􏼡 ,
π h h
v
a� + v,
1 − IOU
s2 B (3)
LCONF � 􏽘 􏽘 k[−lgp +(BCE(􏽢n, n))],
i�0 j�0

s2 B
nooby
LCLASS � 􏽘 􏽘 Ii,j 􏼂−lg 1 − pc 􏼁􏼃.
i�0 j�0

X1 � x ∗ cos θ + y ∗ sin θ,
In the type, S is the number of grids, B is the number of (4)
Y1 � x ∗ sin θ + y ∗ cos θ.
prior boxes in each grid, K is the weight, determine whether
the i prior box of the j grid is responsible for the object. If it
In one type, x and y are the coordinates of the
is, the value is 1, otherwise, it is 0, and there is a probability
original image minus the difference of the center
that the current prior box has objects. The Yolo v4 algorithm
point of the original image. X1 and Y1 are the co-
requires that the output size image should be fixed. When
ordinates of the rotated image minus the difference
the input image size is greater than or less than the specified
of the rotated image center point. θ is the rotation
output image size, the input image will be compressed or
angle, the actual coordinates after rotation are the
stretched, and this process will lead to distortion of the
original coordinates plus the coordinates of the
image. When there are small targets in the picture, it is easy
center point of the image after rotation.
to be blurred or even lost. To solve this problem, this paper
proposes the SSL-Yolo v4 algorithm to improve the original H(x, y)′ � T[B(x, y)]. (5)
data enhancement method of Yolo v4 by contrast en-
hancement, to improve the accuracy of network identifi- In the formula, T is the random deformation op-
cation, positioning, and detection. eration of the above process, B(x, y) is the image
taken from the video, H(x, y)′ is the image obtained
after the deformation operation. f(x, y)i is the new
3. Research Methods dataset, B(x, y) is the original dataset. H is the
deformed dataset.
3.1. Data Augmentation. In the process of data set con-
struction, due to the complexity of construction vehicles and f(x, y)i � B(x, y) + Hi (x, y, θ). (6)
environment, there are few complete data sets available. In
terms of data collection, to ensure the authenticity of the (3) Because the coordinate transformation changes from
data and contact with the construction site, various con- the original integer to the number with the decimal
struction vehicles including cranes and excavators around point, and the new coordinates are rounded off. In
the transmission lines under different backgrounds, such as this process, the coordinates will be lost, which will
trees and houses, were collected. The obtained vehicle lead to the emergence of noise. The solution is to use
datasets are put in the network model for training, and the reverse thinking, reverse rotation from the target
data is enhanced through random rotation, denoising, and image to the original image for pixel search.
other operations. To improve the accuracy of the detector, (4) Linear interpolation of the picture after reverse ro-
the dataset used in this design is to independently complete tation to ensure the pixels of the final output result
the construction vehicle dataset in the MATLAB map and improve the quality of the picture. Figure 1
environment. is a graph of the data processing process, where
(1) The collected video of the engineering vehicle is Figure 1(a) is the original, Figure 1(b) is the noise
divided into 500 frames, and the original image is after random rotation, Figure 1(c) is the reverse
distributed according to the ratio of 6 : 3 : 1.60% of processing, and Figure 1(d) is the final linear
the dataset is randomly rotated, 30% of the dataset is interpolation.
self-supervised detection, and 10% of the dataset is
detection. 3.2. Context-Based Self-Supervised Learning Methods. A
(2) Perform a 0°∼180° random rotation operation on the context-based self-supervised learning strategy is adopted
image, which can increase the diversity of the to generate and input unlabeled data into the training
sample. Several images are randomly generated with network, and model the unlabeled data together with the
no position type and saved as JPG pictures with precollected labeled data. Context-based self-supervised
transparency information. learning can construct a large number of task information,
4 Mobile Information Systems

(a) (b) (c) (d)

Figure 1: sample data processing result diagram. (a) Original drawing. (b) Image after random rotation. (c) Reverse lookup results graph.
(d) Linear interpolation denoising.

such as image mosaic, repair, coloring, rotation, and so on. 3.3. Building the SSL-Yolo v4 Algorithm Network.
With the rotation image as input and the predicted ro- Previous studies have used self-supervised learning networks
tation angle of the image as output, the images with the to increase the number of images. In this study, we removed
building background were rotated 90°, 180°, and 270°, mosaic data enhancement and proposed cutout and mix-up
combined with the dataset of the network training front- based on self-supervision. The self-supervised folders clas-
end, the problem of blurred rotation angle of the input sified by rotation angles, with four different overlapping
image is avoided. Because this study cannot fully simulate images and add noise on the images, are jointly introduced
the complexity of the building background, the image of into the self-adversarial training network at the front end of
the building background is spliced with the image after the Yolo v4 network to train the enhancement results to
rotation to simulate the complex building background. improve the robustness of the model. The bottom right
Using the untrained Resnet50 network as the training shows the Yolo v4 network structure diagram in Figure 4,
network, the validity of the Resnet50 network training and blue represent the highly convolutional module such as CSP,
the accuracy of the classification were proved by the and the output is the 3 required output dimensions. The
previous experiments. In this study, we changed the CNN is a self-adversarial training network (SAT network),
number of nodes in the full connection layer to 4 because which uses the calculation process loss of the CNN, and then
we needed to predict 4 different classifications. After each backpropagation to the image to modify the image infor-
convolution and before the activation of the normalized mation. It is worth noting that this operation does not need
operation to improve the ability of feature extraction. In to change the network weight and directly put the modified
the residual error block of deep convolution, the input and picture into the training network [30].
output are controlled by setting convolution-related pa- When there are many targets in the picture, the accuracy
rameters to increase processing and avoid the loss of of the model should be improved, while the self-supervised
gradient of the deep network. The self-supervised learning model only achieves the local optimization in the training
process not only increases the number of images, but also process and fails the global optimization. To solve this
improves the pixel quality. Figure 2 is a supervised learning problem, we combine self-supervised learning with the Yolo
network structure based on rotation. v4 network front-end to improve the data enhancement
For the vehicle image without construction background algorithm of the Yolo v4 network, and then use the self-
input to the self-monitoring network, the image information adversarial network to backpropagate the information to
is used to generate the vehicle type label online, reducing the modify the original picture. The original Yolo v4 algorithm
complexity of manual labeling, and ensuring the correct rate. adopts the mosaic data enhancement method, which
Using the Resnet50 deep convolution network, there are combines 4 pictures into one training picture with the cut-
normalization operations after each convolution and before mix method. The cut-mix method is to randomly cut pic-
activation, which improves the ability of feature extraction. tures of different shapes and sizes and replace them with
It is guaranteed that the network can be transformed by pictures of the same size and different kinds, to predict the
random operations, but this method loses its effect when the occurrence probability of different kinds of targets. This
number of network depth layers increases gradually. The method can improve the positioning ability and training
residual structure is introduced so that the deep gradients efficiency, but because of the similar background pictures are
can be fed back to the front network. In the residual block of forced splicing but not the area of the target, the background
deep convolution, the dimension of the characteristic graph confusion will increase the difficulty of detection.
of the input and output of the residual block can be con-
trolled by setting the parameters related to the convolution,
so that the additive processing can be carried out, avoid the 3.4. The SSL-Yolo v4 Algorithm Network Training Process.
loss of gradient in deep networks. Figure 3 is a partial result This training uses MATLAB to complete the comparative
diagram of the tag generation online using self-supervised training and research of a variety of advanced target de-
learning. tection networks. In view of the complexity of the
Mobile Information Systems 5

Target generation
ConvNet
g (x, y=0) model F (.) Maximize prob

Rotate 0°
ConvNet Maximize prob
g (x, y=1) model F (.)

Rotate 90°

g (x, y=2) ConvNet Maximize prob


model F (.)
Rotate 180°

g (x, y=3) ConvNet Maximize prob


model F (.)
Rotate 270°

Figure 2: Self-supervised learning network structure based on rotation.

Figure 3: Some results of online generation of labels in self-supervised learning network.

Image
Random sition
Self-supervised su erpo
p
generation selection
oise
Add n
Maxisize prob 1
er
k mast
artwor
Maxisize prob 2
f cars
types o
Other
Maxisize prob 3
Random splicing

Maxisize prob 4
Repeat the operations multiple times

CNN
Output
Calculate Input Main
loss network

Modified Neck
pictures

Figure 4: SSL-Yolo v4 network structure.


6 Mobile Information Systems

construction site, the similarity, occlusion problems and detected as FP, and false negative calls the number not
multiscale changes, and other complex engineering prob- identified as FN. IOU (intersection union) is a standard to
lems between the construction vehicles, the detection speed, measure the accuracy of detecting the corresponding object
and accuracy are suitable for the detection network of the in a specific dataset. There are multiple bounding boxes to
construction site. By collecting the actual video of the predict together, and then the network will choose the well-
construction site, the label data generated after self-super- predicted bounding box (that is, IOU large) online to predict
vised learning is input into the data enhancement network, [31]. The intersection ratio (IOU) is the two regions divided
and the pictures after the noise adding cutout operation and by the set of the two regions.
the random picture overlapping mix-up operation are first
experienced to the front-end self-confrontation network of TP
AveragePrecision � ,
the SSL-Yolo v4 network. TP + FP
(1) Preprocessing the enhanced picture preparation TP
after pretraining to adjust the image size, scale the Recall � , (7)
pixel size, and batch process the input pictures. TP + FN
(2) When the input picture size and the specified net-
AreaofOverlap
work output picture size are inconsistent, according IOU � .
to the feature extraction network input size, adjust AreaofUnion
the input frame and anchor frame and adjust the Previous experiments divided the data into training set
input dataset size to the appropriate size of the and test set, the test set is independent of the training data,
feature extraction network. completely not involved in training, for the evaluation of
(3) Reset the parameters of the SSL-Yolo v4 network, set the final model. But in the training process, the problem of
the number of anchor boxes to 8, and pass the anchor fitting is that the model can match the training data well,
boxes data to the configure yolo v4 function, for the but cannot predict the data outside the training set well. In
correct network arrangement, the configure yolo v4 order to optimize the model effect and verify the network
function can improve the running rate of the generalization performance, the experiment adopts five-
network. fold cross-validation method to get 5 models.
(4) Create the Yolo v4 target detection network and set At first, five-fold cross-validation is adopted, and then
network training parameters; Yolo v4 network three different algorithms are used to illustrate the com-
training optimization method adopts stochastic parison diagram. The dataset used in this experiment is a
gradient descent momentum (SGDM), the initial self-built dataset, split different construction site video to get
learning rate is 0.001, Yolo v4 is divided into 16 10,000 pictures, including 15 different construction vehicle
subsets, the maximum training number of 100. The targets, on an average, there are 1.2 goals in a picture. Divide
anchor box was estimated with the prediction the dataset into five small datasets, data 1, data 2, data 3, data
anchor box from the size of the target in the 4, and data 5, each containing 2000 images. Using data 1,
training data, considering that the image size is data 2, data 3, and data 4, four datasets as the training set,
adjusted before training, the size of the training data 5 as the detection dataset, the precision of the first
data used to estimate the anchor box is also ad- round of experiments and the regression rate were obtained.
justed to set the “CheckpointPath” to a temporary In the second experiment, data 1, data 2, data 3, and data 5
position. This saves the partially trained detector were used as the training set, and data 4 was used as the
during the training process. If the training is detection dataset. The precision and regression rate of the
interrupted due to a power failure or a system second experiment were obtained. By analogy, we carried
failure, you can continue the training from the out five rounds of experiments and got the regression rates
saved checkpoint. For detection, the pretrained of the five models, taking the average value based on the
network is downloaded, the yolov4 network, and precision value. Table 1 shows the results of five-fold cross-
the test image is read. Set the anchor frame and validation, and after five trainings we can see that the third
introduce the target type category, detect the target experiment had the best detection accuracy and regression
image in the figure, and visualize the detection rate, with the average detection accuracy of the model
results. The display results include the target po- reaching 0.933.
sition, size category, and detection accuracy. To verify the validity of the context-based self-supervised
learning model classification, two public datasets were se-
lected: Pascal VOC and CIFAR-10. The Pascal VOC dataset
4. Results and Discussion contained 11530 images for training and testing, calibrating
27450 regions of interest. The dataset grew from four cat-
To accurately evaluate the detection performance of the egories to the last 20 in eight years: human, animal, airplane,
proposed SSL-Yolo v4 algorithm, the detection accuracy automobile, motorcycle, train, dining table, sofa, television,
(average precision), detection speed (detection speed), and and so on. The CIFAR-10 dataset is divided into 5 training
regression rate (recall) are selected. Set the correct number sets and 1 test set, each containing 10000 images. Each RGB
detected as TP, false positive calls the number of errors image contain 32 ∗ 32 in size. Planes, cars, birds, cats, deer,
Mobile Information Systems 7

Table 1: Five-fold cross-validation of experimental results.


Training times Model 1 Model 2 Model 3 Model 4 Model 5 Average values of accuracy
Detection accuracy 0.892 0.937 0.965 0.949 0.921 0.933
Regression rate 0.91 0.88 0.94 0.95 0.93 0.92

Table 2: Comparing the detection speed of IOU and one image under the number of different detection image types.
The number of images Supervision and testing IOU (%)– Yolo v4 algorithm detection IOU The algorithm of this paper IOU
tested speed (fps) (%)–speed (fps) (%)–speed (fps)
50 85.3–22 85.2–22 86.9–20
100 85.6–20 88.0–19 87.4–20
150 85.6–20 88.6–17 88.1–14
200 88.0–18 89.1–12 90.7–10
250 88.2–17 90.9–12 92.3–8
300 87.2–17 90.7–10 94.0–8

0.9 0.95

0.85 0.9
Detection Precision

0.85
0.8
Recall

0.8

0.75 0.75

0.7 0.7

0.65
0.65
0.6
50 100 150 200 250 50 100 150 200 250 300
Number of detected pictures Number of detected pictures

Supervised inspection Supervised inspection


Yolo v4 algorithm detection Yolo v4 algorithm detection
SSL-Yolo v4 algorithm detection SSL-Yolo v4algorithm detection
(a) (b)

Figure 5: Comparison diagram of detection accuracy and recall rate under different number of detection image types. (a) Recall curve.
(b) Detection accuracy curve.

dogs, frogs, horses, boats, and trucks fall into ten broad algorithm and Yolo v4 algorithm have a high detection
categories. In this experiment, 50,100,150,200,250, and 300 speed, while the accuracy of IOU has not been greatly
images were randomly selected as different test sets. The self- reduced. When the number of detection images gradually
supervised method is to use the self-supervised learning increases, both the detection speed and the recall rate
method to build the training model, and the supervised increase. However, as shown in Figure 5, compared with
method is to directly use the label data information to build self-supervised learning, the results of supervised learning
the training model. detection are lower, and the SSL-Yolo V 4 algorithm
Table 2 is the IOU of supervised detection, Yolo v4 proposed in this paper has higher detection accuracy and
algorithm detection and SSL-Yolo v4 algorithm are recall rate, and faster detection speed.
proposed in this paper. The three algorithms have dif- Using the same datasets and different training and de-
ferent datasets (including 50,100,150,200,250, and 300 tection methods, different results are obtained. Figure 6
detection images). It can be seen that the present shows the supervised detection results, Figure 7 shows the
8 Mobile Information Systems

Figure 6: Supervised test results.

Figure 7: Yolo v4 network test results.

Figure 8: SSL- Yolo v4 network test results.

detection results after introducing self-supervised learning 5. Conclusions


in the Yolo v4 network, and Figure 8 introduces the de-
tection results of self-supervised learning after improving Due to the complexity of the construction detection en-
the Yolo v4 data enhancement. According to the detection vironment, there are many uncertainties in the target
accuracy under different circumstances, it can be seen that detection process, which will more or less have a certain
the loss of the detection box in Figure 6 is serious, while impact on the results. As an effective means of security,
Figure 7 diagram introducing self-supervised learning can the video surveillance system requires high requirements
detect small targets, but, because the helmet covers the face, on attention, vigilance, and especially the ability to re-
it is not completely detected. Figure 8 is the detection results spond to abnormal situations. This paper proposes the
after improving the data enhancement method and intro- SSL-Yolo v4 algorithm, which introduces a self-super-
ducing the contrast enhancement of different targets, which vised learning method, turns the manual annotation
can clearly see that the detection coverage rate and detection detection box problem into automatic or semiautomatic
accuracy have been improved. The algorithm proposed can annotation, and saves artificial methods while realizing
simulate different external environments and mark the data enhancement. At the same time, improving the Yolo
vehicle position more accurately when the vehicle features v4 data enhancement method, adding contrast training,
are not obvious. By comparison, it shows that the proposed also achieves the data enhancement and improves the
SSL-Yolo v4 algorithm has higher detection accuracy and model robustness, and improves the detection accuracy
more accurate detection type when the camera is above and and speed. Pretraining and training on images containing
blocked. 2000 images on three different datasets yielded the SSL-
Mobile Information Systems 9

Yolo v4 network. The comparison of the simulation results [9] F. Lu, S. B. Shen, and X. Y. Su, “Vehicle detection algorithm in
shows the detection accuracy and recall of the detection traffic surveillance video based on improved Mask R-CNN,”
accuracy and speed. However, the algorithm proposed still Journal of Nanjing Normal University, vol. 20, no. 4,
has some disadvantages. When the input picture pixels are pp. 44–50, 2020.
not high enough, the detection accuracy will decline or [10] L. Qiu, D. B. Zhang, Y. Tian, and N. Al-Nabhan, “Deep
even appear as classification errors, which will be further learning-based algorithm for vehicle detection in intelligent
transportation systems,” The Journal of Supercomputing,
made in future research.
vol. 77, no. 10, Article ID 11083, 2021.
[11] Y. Fan, Y. Y. Luo, and X. J. Chen, “Research on face rec-
Data Availability ognition technology based on improved YOLO deep con-
volution neural network,” Journal of Physics: Conference
The data used to support the findings of this study are in- Series, vol. 1982, no. 1, Article ID 12010, 2021.
cluded within the article. [12] O. Maliet and H. Morlon, “Fast and accurate estimation of
species-specific diversification rates using data augmenta-
Conflicts of Interest tion,” Systematic Biology, vol. 71, no. 2, pp. 353–366, 2021.
[13] Z. J. Yang, C. Y. Diao, and B. Li, “A robust hybrid deep
The authors declare that there are no conflicts of interest. learning model for spatiotemporal image fusion,” Remote
Sensing, vol. 13, no. 24, p. 5005, 2021.
[14] H. M. Liu, H. Guan, and M. H. Yu, “Research and imple-
Acknowledgments mentation of a multi-feature fusion vehicle tracking algo-
The work described in this article was supported by the funds rithm,” Small microcomputer system, vol. 41, no. 6,
from the Basic Scientific Research Projects of the Educa- pp. 1258–1262, 2020.
[15] X. M. Bao and S. Q. Wang, “A survey of deep learning-based
tional Department of Liaoning Province (grant no.
target detection algorithms,” Sensors and microsystems,
LJKZ0585) and the project of Ministry of Housing and
vol. 41, no. 4, pp. 5–9, 2022.
Urban-Rural Construction of Foundation (grant no. 2019- [16] H. Kim and K. Kim, “Data-driven scene parsing method for
K-168), thirty thousand RMB. recognizing construction site objects in the whole image,”
Automation in Construction, Pt2, vol. 71, pp. 271–282, 2016.
References [17] B. Wang, S. C. Liu, B. Wang, W. Wu, J. Wang, and D. Shen,
“Multi-step ahead short-term predictions of storm surge level
[1] S. Albahli, N. Nida, A. Irtaza, M. H. Yousaf, and using CNN and LSTM network,” Acta Oceanologica Sinica,
M. T. Mahmood, “Melanoma lesion detection and segmen- vol. 40, no. 11, pp. 104–118, 2021.
tation using YOLOv4-DarkNet and active contour,” IEEE [18] I. Ahmed, G. Jeon, A. Chehri, and M. M. Hassan, “Adapting
Access, vol. 8, Article ID 198403, 2020. Gaussian YOLOv3 with transfer learning for overhead view
[2] N. Saeed, N. King, Z. Said, and M. A. Omar, “Automatic human detection in smart cities and societies,” Sustainable
defects detection in CFRP thermograms, using convolutional Cities and Society, vol. 70, Article ID 102908.
neural networks and transfer learning,” Infrared Physics & [19] M. Lanaro, M. P. Mclaughlin, M. J. Simpson et al., “A
Technology, vol. 29, pp. 257–261, 2020. quantitative analysis of cell bridging kinetics on a scaffold
[3] R. Balia, S. Barra, S. Carta, G. Fenu, A. Sebastian Podda, and using computer VisionAlgorithms,” Acta Biomaterialia,
N. Sansoni, “A deep learning solution for integrated traffic vol. 136, no. 136, pp. 429–440, 2021.
control through automatic license plate recognition,” in [20] X. Bing, “Research on image processing technology based on
Proceedings of the International Conference on Computational
computer vision algorithm,” Basic and Clinical Pharmacology
Science and its Applications, Springer, Cham, September 2021.
and Toxicology, vol. 127, p. 82, 2020.
[4] M. Hnewa and H. Radha, “Object detection under rainy
[21] C. Gonzalez Viejo, S. Fuentes, D. Torrico, K. Howell, and
conditions for autonomous vehicles: a review of state-of-the-
F. R. Dunshea, “Assessment of beer quality based on foam-
art and emerging techniques,” IEEE Signal Processing Mag-
ability and chemical composition using computer vision al-
azine, vol. 38, no. 1, pp. 53–67, 2021.
[5] Y. Cai, T. Luan, H. Gao et al., “YOLOv4-5D: an effective and gorithms, near infrared spectroscopy and machine learning
efficient object detector for autonomous driving,” IEEE algorithms,” Journal of the Science of Food and Agriculture,
Transactions on Instrumentation and Measurement, vol. 70, vol. 98, no. 2, pp. 618–627, 2018.
pp. 1–13, 2021. [22] G. Wu, X. T. Zhu, and S. J. Gong, “Tracklet self-supervised
[6] Z. Liu, Y. Cai, H. Wang et al., “Robust target recognition and learning for unsupervised person Re-identification,” Pro-
tracking of self-driving cars with radar and camera infor- ceedings of the AAAI Conference on Artificial Intelligence,
mation fusion under severe weather conditions,” IEEE vol. 34, no. 7, pp. 12362–12369, 2020.
Transactions on Intelligent Transportation Systems, vol. 23, [23] B. Cao, H. Zhang, N. N. Wang, X. Gao, and D. Shen, “Auto-
no. 7, pp. 6640–6653, 2022. GAN: self-supervised collaborative learning for medical im-
[7] Y. S. Gao, W. Z. Chen, and J. Wang, “UAV tansmission line age synthesis,” Proceedings of the AAAI Conference on Arti-
construction vehicle inspection under Android platform,” ficial Intelligence, vol. 34, no. 7, Article ID 10486, 2020.
Computer Systems Applications, vol. 29, no. 2, pp. 257–261, [24] S. L. Wang, W. X. Che, Q. Liu, P. Qin, T. Liu, and W. Y. Wang,
2020. “Multi-task self-supervised learning for disfluency detection,”
[8] Y. G. Li, Z. S. Zhang, and X. G. Wu, “An anti-jamming vehicle Proceedings of the AAAI Conference on Artificial Intelligence,
detection algorithm based on magnetoresistive sensor,” vol. 34, no. 5, pp. 9193–9200, 2020.
Journal of Dongguan University of Technology, vol. 28, no. 5, [25] I. Abdallah, K. Tatsis, and E. Chatzi, “Unsupervised local
pp. 38–44, 2021. cluster-weighted bootstrap aggregating the output from
10 Mobile Information Systems

multiple stochastic simulators,” Reliability Engineering and


System Safety, vol. 199, 2020.
[26] J. Zhao, H. C. Wei, and X. Y. Zhao, “Application of improved
YOLO v4 model for real time video fire detection,” Basic and
Clinical Pharmacology and Toxicology, vol. 128, pp. 737-738,
2021.
[27] I. S. Golyak, D. R. Anfimov, I. L. Fufurin et al., “Optical multi-
band detection of unmanned aerial vehicles with YOLO v4
convolutional neural network,” SPIE FUTURE SENSING
TECHNOLOGIES, vol. 11525, 2020.
[28] S. Q. Wang, Z. Z. Wu, G. W. He, S. Wang, H. Sun, and F. Fan,
“Semi-supervised classification-aware cross-modal deep
adversarial data augmentation,” Future Generation Computer
Systems, vol. 125, pp. 194–205, 2021.
[29] E. Avuçlu, “A new data augmentation method to use in
machine learning algorithms using statistical measurements,”
Measurement, vol. 180, Article ID 109577.
[30] F. Peter, M. Lucas, and T. Russ, “Self-supervised corre-
spondence in VisuomotorPolicyLearning,” IEEE Robotics and
Automation Letters, vol. 05, no. 2, pp. 737-738, 2020.
[31] X. Y. Hou, Y. Zhang, and J. Hou, “Application of YOLO V2 in
construction vehicle detection,” Lecture Notes on Data En-
gineering and Communications Technologies, vol. 171,
no. 4356, pp. 1249–1256, 2021.

You might also like