IET Image Processing - 2022 - Zheng - Fast ship detection based on lightweight YOLOv5 network
IET Image Processing - 2022 - Zheng - Fast ship detection based on lightweight YOLOv5 network
DOI: 10.1049/ipr2.12432
1 INTRODUCTION (CNN) based ship target detection algorithm, which uses CNN
to predict the type and location of the target and assists in cor-
In recent years, marine ship monitoring has received more and recting the target localization using the saliency map, and the
more attention. The fast detection method of marine targets experimental results show that the method has high detection
based on dynamic video is one of the key technologies to realize accuracy and speed.
intelligent monitoring of sea areas. The commonly used detec- The third type of methods is the deep learning-based target
tion methods can be divided into three categories in terms of detection algorithm for surface ships. Zhang et al. [4] proposed
sea surface targets. The first type of methods is the detection an integrated target segmentation method based on an interferer
method of sea surface targets based on edge and texture fea- discriminator and a ship target extractor, first using SqueezeNet
tures. Zhang et al. [1] used the DCT domain energy features of network as an interferer discriminator to determine what type
image sub-blocks to achieve fast extraction of sea level, establish of interference is contained in the input image, and then using
a sea surface hybrid texture model, and achieve fast segmenta- the improved DeepLabv3+ depth network to segment the ship
tion of sea surface background and ship targets. target. Experimental results show that the method has high seg-
The second type of methods is to imitate the visual atten- mentation accuracy and good fog resistance. Wang et al. [5]
tion selection mechanism of human eyes and find the saliency achieved fast end-to-end ship target detection with improved
map of the target of interest and achieve the detection of ship YOLOv3 (You Only Look Once) with 74.8% detection accu-
targets according to the established visual attention model. Shi racy and 29.8 frames per second detection speed in GPU 1080Ti
et al. [2] first extracted the low-frequency and high-frequency hardware environment.
features of the image in the wavelet domain, then used a modi- However, the current algorithm still has the following two
fied Gabor filter to extract the directional features and extracted problems: one is the low accuracy of small target detection.
the colour and moment features in HIS space. Finally, they fused Second is most of the existing algorithms still cannot meet the
the above features to obtain the saliency map and detected ship needs of practical applications. In order to overcome the above
targets. Shao et al. [3] proposed a convolutional neural network problems, this paper compares different network frameworks
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited and is not used for commercial purposes.
© 2022 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology
[6–12], and finally selects YOLOv5 network as the basic frame- The network structure of YOLOv5 is shown in Figure 1.
work. By improving the adaptive targeting frame algorithm, YOLOv5 is divided into four main parts, which are network
select a more appropriate anchor box, and optimize the network input, feature extraction backbone, feature extraction parsing
pruning to improve the performance of the algorithm. network and prediction module. The input uses three methods
The contribution of this work can be summarized as fol- to enhance features: mosaic data enhancement, adaptive anchor
lows: (1) In order to improve the detection accuracy, the t-SNE frame calculation and adaptive image scaling. The purpose of
weighted clustering algorithm is proposed to be applied to the mosaic data enhancement is to make the model better detect
data processing process to realize the mapping of data to the small objects in the image. The input data is sliced by focus
high-dimensional space, and to perform accurate classification before entering the backbone. The focus structure mainly
in the high-dimensional space to obtain more accurate predic- expands the original image of three channels to 12. CSP struc-
tion boxes; (2) In order to reduce the computational complexity ture reduces the parameters and size of the model from the
of the algorithm, the BN scaling factor is further used to fine- perspective of network structure design. Neck enhances the net-
tune the network channel to realize the lightweight algorithm; work feature fusion for FPN + PAN structure, and uses a larger
(3) The improved lightweight algorithm model can be deployed feature map to compensate for the loss of feature information
to the edge embedded equipment of the offshore target detec- at the top of feature pyramid. Prediction uses GIOU_Loss
tion platform to realize real-time monitoring on the sea. The test function, which is used to estimate the recognition loss of the
results show that the detection accuracy of the improved algo- detection target rectangle. Four kinds of network with different
rithm is increased by 2.34%, and the detection speed can reach depth and width sizes are designed in YOLOv5, which are
20 fps on edge embedded devices. YOLOv5s, YOLOv5m, YOLOv5l and YOLOv5x. YOLOv5s
is the smallest network framework of YOLOv5. When it runs
on Titan RTX 2080 GPU, 78 frames can be processed per
2 INTRODUCTION TO THE BASIC second, but if deployed on the Jetson nano embedded device
PRINCIPLE OF YOLO NETWORK for detection, the operation speed is only 4 fps, which cannot
achieve the effect of real-time detection. To overcome this prob-
YOLO [13–16] is a single-stage target detection algorithm. With lem, this paper proposes an optimized kernel clustering and
the development of YOLO, its detection accuracy and speed are quantization compression YOLOv5 network structure method.
gradually improving. YOLOv2 proposes a joint training algo- The network structure is reduced by compression pruning to
rithm, which improves the accuracy and speed of prediction. reduce the model size and runtime memory usage to overcome
YOLOv3 joins the multi-scale network architecture in the FPN the problem that the ship cannot be detected in real time.
network [17]. It deepens the network backbone framework and
improves the detection accuracy of the algorithm for multi-scale
targets. YOLOv4 uses a large number of tricks to improve the 3 OPTIMIZATION OF YOLOV5
detection accuracy of the algorithm as a whole. YOLOv5 slices ALGORITHM
the picture and adds CSPNet (Cross Stage Partial Networks) to
the backbone network. YOLOv5 significantly reduces the skele- 3.1 Optimization of network structure
ton of the network system. With its lightweight model size, the
object recognition speed can be as high as 140 fps when running As shown in the Figure 2, YOLOv5 has been optimized and
on the server. improved based on data enhancement to improve detection
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ZHENG ET AL. 1587
accuracy and speed. The t-SNE is set to extract low-dimensional SNE (stochastic neighbour embedding). SNE maps data points
feature information in order to better handle the relevant data. to probability distribution by affine transformation, and uses
The obtained low-dimensional data are sent into the weighted Euclidean distance to express the similarity between points.
kernel function clustering to get a more accurate prediction tar- Given a piece of N high-dimensional data. First, calculate the
get frame for a better prediction effect. The obtained data set probability p j |i proportional to the similarity between data
are fed into the network training optimization, and the trained points xi and x j . Equation (1) represents the conditional prob-
model is pruned using the BN scaling factor γ. The memory ability of similarity expressed by high dimensional Euclidean
consumption at runtime is reduced and the number of compu- distance. Parameter 𝜎i is based on data point xi√is the center
tational operations is reduced without affecting the accuracy to of Gaussian mean square error, here set to 1∕ 2. For low-
facilitate the application of the model to removable devices. dimensionality, the similarity between yi pairs can be expressed
as Equation (2);
( )
3.2 Optimization of clustering algorithm exp − ∥ xi − x j ∥2 ∕2𝜎i2
p j |i = ∑ ( ) (1)
k≠i exp − ∥ xi − xK ∥ ∕2𝜎i
2 2
YOLOv5 optimizes the preprocessing of the data set. Auto
( )
learning bounding box anchors aim to get the preset anchor exp − ∥ yi − y j ∥2
frame suitable for the predicting of the object bounding box q j |i = ∑ (2)
k≠i exp − ∥ yi − yK ∥
( 2)
in the custom data set. Adaptive anchor frame calculation is to
update the target frame by updating the predicted frame area
Set the similarity probability of xi and yi to 0. The distance
of each iteration. Because the accuracy of target detection is
between the two probability distributions is the KL divergences
closely related to the setting of prediction box. The more accu-
(Kullback-Leibler divergences). Then the cost function is as
rate the prediction box, the higher the accuracy of its detection.
shown in Equation (3). Pi represents the conditional probability
t-SNE (t-Distributed Stochastic Neighbour Embedding) algo-
distribution of all other data points at a given point xi . When
rithm [18] is used to reduce the dimension of anchor frame pre-
the dimensionality reduction effect is good, p(j|i) = q(j|i).
diction, and then combined with the weighted kernel clustering
algorithm to predict the size of the frame. A more accurate pre- { }
∑ ∑∑ p j |i
diction target frame is obtained, so as to achieve better predic- C = KL (Pi |Qi ) = p j |i log (3)
tion effect. i i j
q j |i
t-SNE reduces high-dimensional data to two-dimensional or
three-dimensional low dimensional space. t-SNE obtains the Analogous to the objective function gradient formula
joint probability of high-dimensional data and low-dimensional of softmax, the gradient of the conditional probability of
mapping points through the symmetry of conditional probabil- i under j in the objective function of SNE is derived
ity, and minimizes KL divergence to reduce the difference of as 2(pi| j − qi| j )(yi − y j ). The gradient of the conditional proba-
conditional probability distribution. t-SNE is developed from bility of j under i is 2(p j |i − q j |i )(yi − y j ). Finally, the complete
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1588 ZHENG ET AL.
{ } k ∑
∑
∑∑ pi j
C = KL (Pi |Qi ) = pi, j log (5) J (v ) = w (i ) ||𝜙 (ai ) − ck ||2 (8)
i j
qi j i=1 ai ∈𝜋k
Equation (6) shows the change of q after using the t distri- The calculation of non-linear kernel function is relatively dif-
bution. The t-distribution is the superposition of an infinitely ficult. From a mathematical point of view, there is a function
many Gaussian distributions, which reduces amount of calcula- K(x, x′) in a low dimensional space. When K(x, x′) = < φ
tion. The optimized gradient is as shown in Equation (7): (x) ⋅ φ (x′) > , it is exactly equal to the inner product in a
high-dimensional space. The solution of the non-linear kernel
( )−1 function can be obtained by calculating the value of the sample
1+ ∥ yi − y j ∥2
qi j = ∑ (6) points projected into the high-dimensional space and then per-
k≠l
(1+ ∥ yi − yK ∥2 )−1 forming the inner product operation. By calculating the inner
product function of K(x, x′), the distance between the sample
𝜎C ∑ ( )( )−1
= 4 (pi j − qi j ) yi − y j 1+ ∥ yi − y j ∥2 (7) points and the centre of the cluster is obtained, which greatly
𝜎yi j reduces the difficulty of calculation. Equation (9) is the distance
after K function simplification. Equation (10) is the cluster cen-
For points with greater similarity, the distance of t- tre after operation.
distribution in the low-dimensional space needs to be slightly ∑ ( )
smaller. For points with low similarity, the distance of t- 2 a j ∈𝜋k 𝜙 (ai ) ⋅ 𝜙 a j
||𝜙 (ai ) − ck ||2 || = 𝜙 (ai ) ⋅ 𝜙 (ai ) −
distribution in the low-dimensional space needs to be farther. || || |𝜋k |
It meets the requirements of the target frame of the cluster-
∑ ( )
ing algorithm after t-SNE dimensionality reduction. That is, the
a j ,ai ∈𝜋k 𝜙 (ai ) ⋅ 𝜙 a j
cluster points of different classes are separated from each other +
for classification. |𝜋k |2
The clustering algorithm [19] processes the input data set ∑ ( )
( ) 2 a ∈𝜋 𝜙 (ai ) ⋅ 𝜙 a j
to optimize the selection of the initial target frame within the j k
= K x, x ′ −
network. The data set is classified by the size of the target |𝜋k |
frame, and the size of 9 a priori boxes is obtained. The size of ∑ ( )
a j ,ai ∈𝜋k 𝜙 ai ⋅ 𝜙 a j
( )
the prior box is related to the scale. The smaller the prior frame
+ (9)
is, the more detailed target edge information can be obtained |𝜋k |2
when the scale is larger. This approach can cope with most
datasets with a single data source and simple features, however, ∑
ai ∈𝜋k w (i ) 𝜙 (ai )
k-means clustering is biased for complex situations such as ck = ∑ (10)
multiple data sources and differences in the effects of different ai ∈𝜋k w (i )
attributes on different classes of features. In this regard, this
paper adopts a weighted kernel clustering approach to clus- The t-SNE method is used to reduce the dimensionality of
ter the data set. Kernel methods [20] map two-dimensional the image feature matrix, and then whiten it. Finally, a matrix
inputs to a high-dimensional feature space to classify cate- of N images (N, 256) is obtained. Weighted kernel clustering
gories. The sample features in the plane are mapped to the is used to get the image and its corresponding clustering. The
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ZHENG ET AL. 1589
TABLE 1 Comparison of back aiming frames in weighted kernel normalization layer is as follows. Equations (11) and (12) are
clustering the mean value and standard deviation of the output data of
Anchor box 52 × 52 26 × 26 13 × 13 the upper layer, and m is the size of the batch of the training
sample:
Adaptive anchor (10, 13) (30, 61) (116, 90)
frame (16, 30) (62, 45) (156, 198)
(33, 23) (59, 119) (373, 326) 1 ∑
𝜇𝛽 = z (11)
Aiming frame (8, 7) (27, 23) (115, 81) m
after weighted (10, 21) (77, 50) (224, 127)
sum clustering (15, 18) (61, 114) (354, 201) 1 ∑
𝜎2𝛽 = % z − 𝜇𝛽 2 (12)
m
z − 𝜇B
clustering results are shown in Figure 3, which will act as pseudo ẑ = √i (13)
tags, and the model will train on them. After analysing the data 𝜎B2 + 𝜀
set of self-made ship by using the improved clustering algo-
rithm, the size of 9 sets of prior frames is obtained. The com- z = 𝛾ẑ + 𝛽 (14)
parison is shown in Table 1. The experimental results show that
the optimized k-means algorithm has improved the detection
Equation (13) is the result of normalization. ε is a value close
effect.
to 0 added to avoid denominator being 0. Equation (14) is
obtained by reconstructing the data obtained by the above nor-
malization process. Among them, γ and β are learnable param-
3.3 Network layer model compression
eters, which are used to restore the normalized data distribu-
optimization
tion. The scaling factor γ in batch_norm is used to evaluate
the importance of channel. The smaller the number of γ, the
To compress the YOLOv5 model, it is necessary to train the
less important the channel information is. The channel can be
trained model sparsely. The purpose of sparse training is to
deleted. In order to constrain the size of γ, a regular term about
identify the less important channels in the process of model
γ is added to the objective equation, which can be automati-
training, so as to cut the less important channels. Sparsity is
cally pruned in training. But the previous model compression
introduced to the dense connection of deep neural network,
does not have. The L1 norm is calculated for the γ value of each
and the weight proportion with a small proportion is eliminated
channel. The specific calculation equation is as follows:
to reduce the network structure. After initializing the network,
the channel sparse penalty is added to train the network. After ∑
deleting the channel, the fine-tuning network is trained. The LBN = 𝜆 ||𝛾||1 = 𝜆 |𝛾| (15)
channel sparseness method can reduce the size of the model,
reduce the memory consumption at runtime, and reduce the Finally, the loss function trained by the model adds a regular-
number of calculation operations without affecting the accu- ization term to the original loss function:
racy. As shown in Figure 4, the BN scaling factor γ is used to
prune the channels of the network. L = LYOLO + LBN (16)
The scale factor in the batch normalization layer [21] is used
as index of channel importance, L1 norm is calculated for the λ is the penalty term. After sparse training, a global thresh-
scale factor and trained in the loss function to obtain the impor- old γ is introduced to decide whether to cut a feature channel.
tance score of each channel [22]. The update process of batch Prune the channels whose channel scale factor γ is less than
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1590 ZHENG ET AL.
Model FPS
Model volume (MB) AP (%) (frame/S)
TITAN RTX
Size Precision type CPU 2080GPU Jetson nano Jetson Xavier nx
which is reflected in the frame rate of the improved model is 15 5 TEST EXPERIMENT AND RESULT
fps faster than eYOLOv3, and 5 fps faster than the s-CNN, the ANALYSIS
improved model has a better frame rate than other algorithms.
And the accuracy of the improved model is 96.31%, which 5.1 Testing data
is 6.68% higher compared to the ssd algorithm and 4.61%
higher compared to the more accurate s-CNN. This indicates Based on the existing untrained five hundred images and three
that our method again improves the detection accuracy while sea surface monitoring videos to form a testset, the perfor-
enhancing the detection speed. It is beneficial for practical mance of the network is evaluated in two dimensions: the actual
deployment. detection effect and the actual arithmetic power.
Table 3 shows the comparison of recall, accuracy and per-
formance on GPU before and after the improvement using
the four different versions of YOLOv5 trained separately 5.2 Analysis of test experiment results
using the ship training set integrated in this paper. It can be
seen that the improved v5 network has improved in both Figure 6 gives a comparison of the detection results before and
velocity and accuracy. The accuracy of the improved model after the improvement of a set of test data. As can be seen
detection has been improved compared to the accuracy of the from the figure, the detection frame of the ship before the
model detection before the improvement, and the speed of improvement could not be precisely positioned due to overlap,
the improved model operation has been improved significantly. which became correct and accurate after the improvement. The
Take v5x as an example, its detection speed on GPU before the improved detection frame of the ship is more accurate and fit
improvement reached 28.48 frames per second, and after the the target better than before the improvement. The detection
improvement, the detection speed increased to 45.97 frames accuracy also improves accordingly due to the improvement of
per second. When applied to the actual video detection, a detection frame accuracy.
one-minute dynamic video detection takes four minutes before In GPU platform, the network model can detect the ship tar-
the improvement, but only two minutes after the improvement. get at several tens of frames per second, but for the video pro-
Table 4 shows the reasoning time comparison of float16 cessing recognition system of the sea mobile platform, only light
images with different resolutions in different network environ- weight devices can be used, which is not convenient to con-
ments. The deep neural network with floating-point bit width figure the highpower consuming GPU. Therefore, this paper
requires a lot of computing resources. Here, the floating-point counted the processing speed of various network models for
type with 16bit width is set to reduce the computational com- ship video on the embedded development board respectively.
plexity of the network. When the input size is 204 × 204, the Table 5 shows the frame rate comparison before and after the
inference time is 25.3 ms on Jetson Xavier nx and 4.18 ms improvement of different models on GPU, embedded device
on TITAN RTX GPU, but the time spent on Jetson nano is Jetson Xavier nx and Jetson nano. It can be seen that the com-
498.5 ms. Further optimization is needed to improve the speed pressed model has a great improvement over the previous one,
of the algorithm on low-power devices. and the improved YOLOv5s can reach a frame rate of 98.5 in
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1592 ZHENG ET AL.
FIGURE 6 Comparison of test results before and after improvement. (a) Test results before improvement. (b) Test results after improvement
TABLE 5 Frame rate comparison before and after improvement of four models
FPS (frame/S)
Model
Jetson nano Gpu Jetson Xavier nx
the server environment and 20.3 fps on Jetson nano, which is improve the real-time performance of the algorithm while sat-
beneficial for the model to achieve fast pedestrian detection. isfying the detection accuracy; and the porting implementation
The experimental results demonstrate that the improved model of the detection algorithm on surface unmanned ships will be
can effectively compress the volume and floating-point oper- carried out.
ations to improve the detection speed of the algorithm on the
basis of guaranteed accuracy, and the compressed model volume ACKNOWLEDGEMENTS
and prediction speed are better than the traditional YOLOv5 This work was supported by: Xiamen Municipal Ocean and
model. Fishery Development Special Fund (No. 21CZB013HJ15),
Key Project of Fujian Science and Technology Plan (No.
2017h0028); Fund Project of Jimei University (No. zp2020042);
6 CONCLUSION Xiamen Key Laboratory of Marine Intelligent Terminal R&D
and Application (No. B18208).
In this paper, we propose a target detection algorithm for
marine ships based on the improved YOLOv5 network model. CONFLICT OF INTEREST
The network reconstructs the adaptive anchor frame and adopts The authors declare that they have no financial or personal
the t-SNE algorithm dimensional mapping, which enables the relationships with other organizations or individuals that may
weighted kernel clustering algorithm to achieve more accu- improperly affect their research work. The opinions and con-
rate target frame positioning, fully extract target features and clusions in the paper entitled “Fast Ship Detection based on
improve target detection accuracy; the BN scaling factor is used Lightweight YOLOv5 Network”, no institution, company or
to compress and prune the network layer to eliminate redun- individual can explain that they are related to their products, ser-
dant parameters, further reduce the model size and computa- vices or intellectual property rights.
tional effort, and then fine-tune the pruned model to improve
the accuracy, which can significantly improve the video frame DATA AVAILABILITY STATEMENT
detection rate. The experimental results show that the algorithm Since the data is part of the ongoing research, the data set
can quickly and effectively detect surface ship targets. Provides required for the current algorithm research cannot be shared
a theoretical basis for implementing target detection tasks for at present.
small devices with limited storage and edge mobile devices. In
the future research, the streamlining method of deep convolu- ORCID
tional network structure will be further investigated to further Shi-Dan Sun https://orcid.org/0000-0001-7219-5933
17519667, 2022, 6, Downloaded from https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.12432, Wiley Online Library on [04/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ZHENG ET AL. 1593
REFERENCES 13. Joseph, R., Ali, F.: YOLO 9000: Better, faster, stronger. In: IEEE Con-
1. Zhang, Y., Li, Q., Zang, F.: Ship detection for visual maritime ference on Computer Vision and Pattern Recognition (CVPR). Honolulu,
surveillance from non-stationary platforms. Ocean Eng. 141(1), 53–63 HI, pp. 7263–7271 (2017)
(2017) 14. Joseph, R., Santosh, K., Ross, G., Ali, F.: You only look once: Unified, real
2. Shi, G., Suo, J.: Ship targets detection based on visual atten- - Time object detection. In: IEEE Conference on Computer Vision and
tion. In: IEEE International Conference on Signal Processing, Pattern Recognition (CVPR). Las Vegas, NV, pp. 779–788 (2016)
Communications and Computing (ICSPCC). Qingdao, pp. 1–4 15. Joseph, R., Ali, F.: YOLOv3: An incremental improvement. In: IEEE Con-
(2018) ference on Computer Vision and Pattern Recognition. Salt Lake City, UT,
3. Shao, Z., Wang, L., Wang, Z., et.al.: Saliency-aware convolution neural Net- pp. 89–95 (2018)
work for ship detection in surveillance video. IEEE Trans. Circuits Syst. 16. Bochkovskiy, A., Wang, C.Y., Liao, H.: YOLOv4: Optimal speed and accu-
Video Technol. 30(3), 1–15 (2019) racy of object detection. arXiv:2004.10934 (2020)
4. Zhang, W., He, X., Li, W., et.al.: An integrated ship segmentation method 17. Lin, T., Dollár, P., Girshick, R., et.al.: Feature pyramid networks for object
based on discriminator and extractor. Image Vision Comput. 89(1), 1–11 detection. In: IEEE Conference on Computer Vision and Pattern Recog-
(2019) nition (CVPR). Honolulu, HI, pp. 2117–2125 (2017)
5. Wang, Y., Ning, X., Leng, B., et.al.: Ship detection based on deep Learn- 18. Laurens, M., Geoffrey, H.: Visualizing data using t-SNE. J. Mach. Learn.
ing. In: IEEE International Conference on Mechatronics and Automation Res. 9(2605), 2579–2605 (2008)
(ICMA). Tianjin, pp. 275–279 (2019) 19. Sulaiman, S., Isana, M.: Adaptive fuzzy-k-means clustering algorithm for
6. Spyros, G., Nikos, K.: Object detection via a multi - region and semantic image segmentation. IEEE Trans. Consum. Electron. 56(4), 2661–2668
segmentation - aware CNN model. In: IEEE International Conference on (2010)
Computer Vision (ICCV). pp. 1134–1142 (2015) 20. Geng, F., Qian, S.: An optimal reproducing kernel method for linear non-
7. Lin, T., Dollar, P., Girshick, R., et.al.: Feature pyramid networks for object local boundary value problems. Appl. Math. Lett. 77, 49–56 (2017)
detection. In: IEEE Conference on Computer Vision and Pattern Recog- 21. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network
nition (CVPR). Honolulu, HI, pp. 936–944 (2017) training by reducing internal covariate shift. arXiv:1502.03167 (2015)
8. Kaiming, H., Georgia, G., Piotr, D., Ross, G.: Mask R - CNN. In: IEEE 22. Zhuang, L., Li, J., Shen, Z., et.al.: Learning efficient convolutional networks
International Conference on Computer Vision (ICCV). Venice, Italy, pp. through network slimming. In: IEEE International Conference on Com-
2980–2988 (2017) puter Vision (ICCV). Venice, Italy, pp. 2736–2744 (2017)
9. Dai, J., Qi, H., Xiong, Y., et.al.: Deformable convolutional networks. In: 23. Liu, W., Yuan, W., Chen, X., Lu, Y.: An enhanced CNN-enabled learn-
IEEE International Conference on Computer Vision (ICCV). Venice, ing method for promoting ship detection in maritime surveillance system.
Italy, pp. 764–773 (2017) Ocean Eng. 235, 109435 (2021)
10. Yang, Z., Liu, S., Hu, H., et.al.: RepPoints: Point set representation for
object detection. In: IEEE International Conference on Computer Vision
(ICCV). Coex, pp. 9656–9665 (2019)
11. Tian, Z., Shen, C., Chen, H., He, T.: FCOS: Fully convolutional one - stage How to cite this article: Zheng, J.-C., Sun, S.-D.,
object detection. In: IEEE International Conference on Computer Vision Zhao, S.-J.: Fast ship detection based on lightweight
(ICCV). Coex, pp. 9627–9636 (2019)
YOLOv5 network. IET Image Process. 16, 1585–1593
12. Wei, L., Dragomir, A., Dumitru, E., et.al.: SSD: Single shot multibox detec-
tor. In: European Conference on Computer Vision (ECCV). Amsterdam, (2022). https://doi.org/10.1049/ipr2.12432
pp. 21–37 (2016)