0% found this document useful (0 votes)
4 views

TOD-CNN An Effective Convolutional Neural Network For Tiny Object

Uploaded by

Thang Viet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

TOD-CNN An Effective Convolutional Neural Network For Tiny Object

Uploaded by

Thang Viet
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/360135349

TOD-CNN: An effective convolutional neural network for tiny object detection


in sperm videos

Article in Computers in Biology and Medicine · April 2022


DOI: 10.1016/j.compbiomed.2022.105543

CITATIONS READS

3 36

9 authors, including:

Jiawei Zhang Xinyu Huang


Northeastern University (Shenyang, China) Universität zu Lübeck
16 PUBLICATIONS 44 CITATIONS 17 PUBLICATIONS 74 CITATIONS

SEE PROFILE SEE PROFILE

Marcin Grzegorzek
Universität zu Lübeck
235 PUBLICATIONS 1,774 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

ELISE. View project

Sensor-based Quality Inspection of Railway Sleepers View project

All content following this page was uploaded by Jiawei Zhang on 05 July 2022.

The user has requested enhancement of the downloaded file.


Computers in Biology and Medicine 146 (2022) 105543

Contents lists available at ScienceDirect

Computers in Biology and Medicine


journal homepage: www.elsevier.com/locate/compbiomed

TOD-CNN: An effective convolutional neural network for tiny object


detection in sperm videos
Shuojia Zou a, Chen Li a, *, Hongzan Sun b, Peng Xu c, Jiawei Zhang a, Pingli Ma a, Yudong Yao d,
Xinyu Huang e, Marcin Grzegorzek e
a
Microscopic Image and Medical Image Analysis Group, College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, China
b
Shengjing Hospital, China Medical University, Shenyang, China
c
Jinghua Hospital, Shenyang, China
d
Department of Electrical and Computer Engineering, Stevens Institute of Technology, USA
e
Institute of Medical Informatics, University of Luebeck, Luebeck, Germany

A R T I C L E I N F O A B S T R A C T

Keywords: The detection of tiny objects in microscopic videos is a problematic point, especially in large-scale experiments.
Image analysis For tiny objects (such as sperms) in microscopic videos, current detection methods face challenges in fuzzy,
Object detection irregular, and precise positioning of objects. In contrast, we present a convolutional neural network for tiny
Convolutional neural network
object detection (TOD-CNN) with an underlying data set of high-quality sperm microscopic videos (111 videos, >
Sperm microscopy video
278,000 annotated objects), and a graphical user interface (GUI) is designed to employ and test the proposed
model effectively. TOD-CNN is highly accurate, achieving 85.60% AP50 in the task of real-time sperm detection
in microscopic videos. To demonstrate the importance of sperm detection technology in sperm quality analysis,
we carry out relevant sperm quality evaluation metrics and compare them with the diagnosis results from
medical doctors.

1. Introduction Fig. 7).


In recent years, more and more excellent object detection models are
Sperm is necessary for the human and mammal reproductive process, constantly proposed [17], such as Region-based CNN (R-CNN) series
which plays an important role in human reproduction and animal models [18–21], You Only Look Once (YOLO) series models [22–25],
breeding [1]. With the continuous development of computer technol­ Single Shot Multibox Detector (SSD) [26], and RetinaNet [27]. The per­
ogy, researchers have tried to use computer-aided image analysis in formance of Convolutional Neural Networks (CNN) has obviously sur­
many fields, such as whole-slide image analysis [2], histopathology passed the complex classic image processing algorithms in the field of
image analysis [3–5], cytopathological analysis [6,7], COVID-19 image medical image processing [28–30], which makes it possible to use deep
analysis [8,9], and microorganism counting [10]. In addition, in the learning methods to perform real-time sperm object detection tasks in
field of semen analysis and diagnosis, researchers have also proposed sperm microscopic videos. However, the accuracy of sperm object
many Computer Aided Semen Analysis (CASA) systems [11]. As the first detection is still lower than that of object detection under conventional
step of the CASA system, sperm detection is one of the most important scales [31]. Hence, techniques such as feature fusion and residual net­
parts to support the reliability of sperm analysis results [12]. At present, works are used in our method to improve the detection performance in
most sperm detection techniques [13–16] are based on traditional image this field. The technologies above are applied to build an easy-to-operate
processing techniques such as thresholding, edge detection and contour sperm detection model (TOD-CNN), and an AP50 of 85.60% is achieved
fitting. However, for many techniques, the detection results require in the task of sperm detection for microscopic videos.
manual intervention. The common difficulties for sperm detection The workflow of the proposed TOD-CNN detection method is sum­
mainly include the small size, uncertain morphologies and low contrast marized as follows (as shown in Section 3 Fig. 1): (a) Training and
of the sperms, which are difficult for locating. Moreover, there are lots of Validation Data: The training and validation data contains 80 sperm
similar impurities in the samples for misleading (as shown in Section 4 microscopic videos and corresponding annotation data with the location

* Corresponding author.
E-mail address: [email protected] (C. Li).

https://doi.org/10.1016/j.compbiomed.2022.105543
Received 1 March 2022; Received in revised form 30 March 2022; Accepted 17 April 2022
Available online 22 April 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Fig. 1. The workflow of the proposed sperm object detection method using TOD-CNN.

and category information of sperms and impurities. (b) Data Pre­ through modelling. Filtering methods: Ravanfar et al. [35] select several
processing: The sperm microscopic video is divided into frames to obtain suitable structural elements firstly, and then the operation based on
one by one sperm microscopic images, and the object information is Top-hat is used to filter the image sequence to achieve the purpose of
annotated by using LabelImg software. (c) Training Process: The TOD- separating sperm and other debris. Nurhadiyatna et al. [36] use the
CNN model is trained and the best model is saved to perform sperm Gaussian Mixture Model (GMM) enhanced by the Hole Filling Algorithm
object detection. (d) Test Data: The test data contains 21 sperm micro­ as the probability density function to predict the probability of each
scopic videos. pixel in the image belongs to the foreground and the background. The
The main contributions of this paper are as follows: researchers found that the calculation amount of this method is signif­
icantly less than other methods.
● Build an easy-to-operate CNN for sperm detection, namely TOD-CNN
(Convolutional Neural Network for tiny object detection). 2.1.2. Machine learning methods
● TOD-CNN has excellent detection results and real-time detection The unsupervised learning method is the most used machine learning
ability in the task of tiny object detection in sperm microscopic method. Berezansky et al. [37] use the Spatio-Temporal Segmentation to
video, achieving 85.60% AP50 and 35.7 frames per second (FPS). detect sperm by segmentation, integrating k-means, GMM, mean shift,
and other segmentation methods. Shi et al. [38] use the optical capture
The structure of this paper is as follows: Section 2 introduces the method for sperm detection.
existing sperm object detection methods based on traditional methods,
machine learning methods, and deep learning methods. Section 3 il­
lustrates the detailed design of TOD-CNN. Section 4 introduces the data 2.2. Deep learning based object detection methods
set used in the experiment, experiment settings, evaluation methods,
and results. Chapter 5 is conclusion. Deep learning methods are widely used in many artificial intelligent
fields, for example classification [39–42], segmentation [43–45] and
2. Related work object detection [46,47]. Furthermore, some widely recognized general
object detection models that have been proposed in recent years are
2.1. Existing sperm object detection methods introduced bellow.

2.1.1. Traditional methods 2.2.1. One-stage object detector


Traditional methods mainly include three types, which are The YOLO series models [22–25], RetinaNet [27], and SSD [26] are
threshold-based methods, shape fitting methods, and filtering methods. prominent representatives of one-stage object detectors. The one-stage
Threshold-based methods: Urbano et al. [14] use Gaussian filter to object detectors are based on the idea of regression, which can
enhance the image, and then the image is binarized using the Otsu [32] directly output the final prediction results from the input images without
threshold method, and the result is morphologically operated to deter­ generating suggested regions in advance. YOLO series models: They use
mine the position of the sperm; Elseyed et al. [13] use several certain Darknet as the backbone of the model to extract features from the image.
frames to generate the background information, then the background The v1, v2, v3, v4 are successively proposed by improving the backbone
information is subtracted from the original image (to suppress noise). network structure, improving the loss function, using batch normaliza­
Finally, the Otsu threshold is applied to determine the position of the tion, feature pyramid network [48], spatial pyramid pooling network
sperm. Shape fitting methods: Zhou et al. [33] use a rectangular area [49], and other optimization methods. RetinaNet: It uses ResNet [50]
which is similar to the shape of the object (sperm) to fit the object, and and the feature pyramid network as the backbone of the model, whose
then the position of the sperm is described by the parameters of the main contribution is proposing a focus loss function. The focus loss
rectangle. Yang et al. [16] use an ellipse to approximate the sperm head, function solves the imbalance between the number of foreground and
and the improved multiple birth and cut algorithm based on marked background categories in a single-stage object detector. SSD: It uses
point processes [34] is used to detect and locate the head of the sperm VGG16 [51] as the basic model and then adds a new convolutional layer
based on VGG16 to obtain more feature maps for detection and

2
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Fig. 2. The architecture of TOD-CNN.

generates the final prediction result by fusing the prediction results of 6 addition, Darknet53 follows the idea of ResNet and adds a residual
feature maps. module to the network to solve the vanishing gradient problem of the
deep network. The neck: The neck of the YOLO-v3 model draws on the
2.2.2. Two-stage object detector idea of the feature pyramid network [48] to enrich the information of
The models based on R–CNN series are classical two-stage object the feature map. The head: The head of YOLO-v3 outputs 3 feature maps
detectors. The R–CNN [18] generates proposal regions through the se­ with different sizes and then detects large, medium, and small size ob­
lective search algorithm. Then the features of the proposal regions are jects of the three feature maps with three sizes.
extracted by using CNN. Finally, the SVM classifier is used to predict the
objects in each region and identify the category of the objects. The Fast 3.1.2. Basic knowledge of ResNet
R–CNN [19] no longer extracts features for each proposal region. The ResNet [50] is one of the most widely used feature extraction CNNs
features of entire image is extracted using CNN, then each proposal re­ due to its practical and straightforward structure. With the continuous
gion and corresponding features are mapped. Besides, the Fast R–CNN deepening of CNN, the model’s performance cannot be continuously
uses a multi-task loss function, allowing us to train the detector and improved, and the accuracy may even decrease. However, ResNet pro­
bounding box regressor simultaneously. The Faster R–CNN [20] replaces poses the Shortcut Connection structure to solve the problems above.
the selective search algorithm with Region Proposal Network, which can The identity mapping operation and residual mapping operation are
help CNN to generate proposal regions and detect objects included in the Shortcut Connection structure. The identity mapping is
simultaneously. to pass the current feature map backward through cross-layer transfer
(when the dimension of feature map does not match, a 1 × 1 convolution
3. TOD-CNN based sperm detection method in microscopic operation is used to adjust the dimension of feature map). Residual
image mapping is to pass the current feature map to the next layer after
convolution operation. A Shortcut Connection structure contains one
Sperm detection is always the first step in a CASA system, which identity mapping operation and two or three residual mapping opera­
determines the reliability of the results of sperm microscopic video tions in general.
analysis. However, the existing algorithms cannot accurately detect
sperms. Therefore, we follow the idea of YOLO [22–25], ResNet [50], 3.1.3. Basic knowledge of Inception-v3 and VGG16
Inception-v3 [52], and VGG16 [51] models and propose a novel In Inception-v3 [52], to reduce the parameters and ensure the per­
one-stage deep learning based sperm object detection model formance of the model, an operation that replaces N × N convolution
(TOD-CNN). The workflow of the proposed TOD-CNN detection kernels with 1 × N and N × 1 convolution kernels is proposed. The
approach is shown in Fig. 1. receptive fields of 1 × N and N × 1 convolution kernels and N × N
convolution kernels are the same, where the former has less parameters
than the latter. In addition, the Inception-v3 model can support
3.1. Basic knowledge multi-scale input, which can use convolution kernels with different sizes
to perform convolution operations on the input images, and then the
In this section, the methods related to our work are introduced, input feature maps can be connected to generate the final feature map.
including YOLO, ResNet, Inception-v3, and VGG16 models. VGG16 [51] model includes 13 convolutional layers, 3 fully con­
nected layers, and 5 maxpooling layers. The most prominent feature of
3.1.1. Basic knowledge of YOLO the VGG16 model is its simple structure. All convolutional layers use the
YOLO series models solve the object detection task as a regression same convolution kernel parameters, and all pooling layers use the same
problem. The YOLO series models remove the step of generating the pooling kernel parameters. Although the VGG16 model has a simple
proposal region in the two-stage object detector and accelerate the structure, it has strong feature extraction capabilities.
detection process. YOLO-v3 [24] is the most popular model in the YOLO
series models due to its excellent detection performance and speed.
The YOLO-v3 model mainly consists of four parts, which are pre­ 3.2. The structure of TOD-CNN
processing, backbone, neck, and head. The preprocessing: The k-means
algorithm is used to cluster nine anchor boxes in the data set before The TOD-CNN model, which refers to the YOLO series model, can
training. The backbone: YOLO-v3 uses Darknet53 as the backbone regard the object detection task as a regression problem for fast and
network of the model. Darknet53 does not have maxpooling layers and precise detection. The architecture of TOD-CNN is shown in Fig. 2,
fully connected layers. The fully convolutional network can change the where the entire network is composed of four parts: Data preprocessing,
size of the tensor by changing the strides of the convolutional kernel. In backbone of the network, neck of the network and head of the network.

3
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Fig. 3. The architecture of TOD-CNN backbone.

The detailed implementation of each part is introduced in detail below. deleting images whose Otsu threshold is less than a certain threshold
(from articial experience). In addition, TOD-CNN is an anchor-based
3.2.1. Data Preprocessing object detection model. Therefore, the k-means algorithm is used to
The object detection task in the video is essentially based on image cluster a certain number (TOD-CNN uses six) of anchor boxes in data set
processing. Therefore, it is necessary to split the sperm microscopic to train the model.
video into continuous frames (single images). However, due to the
movement of the lens during the sperm microscopic video shooting 3.2.2. The backbone of TOD-CNN
process, there are some blurred frames in the sperm microscopic video. A straight forward backbone structure with cross-layer concatenate
After analysing the grayscale histogram of frames, there is an obvious operation is designed, which is shown in Fig. 3. However, in a fully
difference between the grayscale distribution of the blurred frame and convolutional network, as the structures of CNNs continue to deepen,
the normal frame. Therefore, the blurred frames can be solved by the semantic information of the feature map becomes more and more

Fig. 4. The architecture of TOD-CNN neck.

4
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

is applied to enhance the transfer of local information (as shown in grey


shaded part in Fig. 3).
The detailed design of TOD-CNN backbone is shown in Fig. 3, where
the input size of the backbone is 416 × 416, the yellow arrow indicates
convolution operations with a kernel size of 3 × 3 and stride of 1 (each
use filtering with padding and followed by a Mish activation), the red
arrow indicates convolution operations with a kernel size of 3 × 3 and
stride of 2 (each followed by a Mish activation), and the green arrow
indicates convolution operations with a kernel size of 1 × 1 and stride of
1 (each use filtering with padding and followed by a Mish activation).

3.2.3. The neck of TOD-CNN


In object detection model, the main purpose of model neck is to
integrate feature informations extracted from model backbone. The neck
structure of TOD-CNN is shown in Fig. 4. In fact, there are abnormal
morphological sperms (very big) and some other impurities (such as
bacteria, protein lumps and bubbles) in semen. These sperms and im­
Fig. 5. The architecture of TOD-CNN head. Calculate the coordinates of the purities are significantly different from normal sperm in size. Therefore,
prediction box using the network output result and priori boxes. tx, ty, tw and th to collect multi-scale information, we have adopted spatial pyramid
are predicted by TOD-CNN for locating the bounding box. P0 and P1 represent pooling operation [49] to integrate multi-scale information into
the probability of sperm and impurity in the bounding box, respectively. C is TOD-CNN neck. In addition, due to the small sizes of tiny objects, the
the confidence to determine whether there is an object in the bounding box. information of tiny objects might be easily lost in down-sampling pro­
cess. In order to solve this problem, the feature fusion method is used in
abundant, while the location information of the feature map constantly TOD-CNN neck, where the shallow and deep feature maps are fused by
decreases. As a result, the network can improve the classification per­ upsampling to avoid the loss of tiny object information.
formance but may reduce the positioning accuracy. Our work focuses on The detailed design of TOD-CNN neck are shown in Fig. 4, where all
detecting tiny objects and accurate locating, which needs to maintain convolution operations are with stride of 1 (each use filtering with
precise local information. Therefore, we enhance the transfer of location padding), and the detailed illustration of the kernel size and activation
information (transferring shallow features to deep layers) through the function is shown in Fig. 4. Finally, TOD-CNN neck outputs a feature
following methods: First, we refer to the residual idea of ResNet, the map of size (input size/8) × (input size/8) × 42, where 42 is the number
Shortcut Connection structure provides the approach for transferring of anchor boxes (6) × 7, because each anchor box needs to have 7 pa­
local information with a cross-layer add operation, which is used in rameters: the relative center coordinates, the width and the high offset,
TOD-CNN (as shown in Res (A, B, C) in Fig. 3); second, based on the class, and confidence, the details are explained in Section 3.2.4.
straightforward backbone structure, a cross-layer concatenate operation

Fig. 6. An example of the sperm video data set. (a) The first row shows a frame in a sperm microscopic video and the bottom row is the corresponding annotation for
object detection tasks. Sperms are in green boxes and impurities are in red boxes. (b) The first row shows a frame in sperm microscopic video and the bottom row
shows the corresponding ground truth for object tracking tasks. (c) The first row shows individual sperm images and the bottom row shows individual impurity
images for classification tasks.

5
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Fig. 7. Challenging cases for the detection of sperms. The positions pointed by the red arrow are the blur of sperm imaging caused by sperm movement and the
impurity similar to sperm.

3.2.4. The head of TOD-CNN objects, such as uncertain morphology sperm, low contrast sperm, and
In the head of TOD-CNN, 6 bounding boxes are predicted for each similar impurities (as show in Fig. 7), which greatly increases the dif­
cell in the output feature map. For each bounding box, 7 coordinates (tx, ficulty of tiny object detection.
ty, tw, th, C, P0 and P1) are predicted, so the dimension of the predicted In this data set, Subset-A provides more than 125,000 objects with
result in Fig. 5 is 42. For each cell, the offset from the upper left corner of bounding box annotation and category information in 101 videos for
the image is assumed to be (Cx, Cy), and the width and height of the tiny object detection task; Subset-B segments more than 26,000 sperms
corresponding a priori box are Pw and Ph. The calculation method of in 10 videos as ground truth for tiny object tracking task; Subset-C
center coordinates (bx and by), width (bw) and height (bh) of predicted provides more than 125,000 independent images of sperms and impu­
box is shown in Fig. 5. Multi-label classification is applied to predict the rities for tiny object classification task. Although Subset-C is not used in
categories in each bounding box. Furthermore, due to the dense pre­ this work, it is still openly available to non-commercial scientific work.
diction method is applied to the head of TOD-CNN, the non-maximum
value suppression method based on distance intersection over union 4.1.2. Training, validation, and test data setting
[53] is used to remove bounding boxes with high overlap in the output We randomly divide the sperm microscopic video into training,
results of the network. validation, and test data sets at a ratio of 6:2:2. Therefore, we have 80
sperm microscopic videos and corresponding annotation information for
4. Experiments training, and validation. The training set includes 2125 sperm micro­
scopic images (77522 sperms and 2759 impurities), and validation set
4.1. Experimental settings includes 668 sperm microscopic images (23173 sperms and 490 impu­
rities). And we have 21 sperm microscopic videos for testing, the test set
4.1.1. Data set includes 829 sperm microscopic images (20706 sperms and 1230
A sperm microscopic video data set is released in our previous work impurities).
[54] and it is used for the experiments of this paper. These sperm
microscopic videos in the data set are obtained by a WLJY-9000 com­ 4.1.3. Experimental environment
puter-aided sperm analysis system [55] under a 20 × objective lens and The experiment is conducted by Python 3.7.0 in Windows 10 oper­
a 20 × electronic eyepiece. More than 278,000 objects are annotated in ating system. The models we use in this paper are implemented by Keras
the data set: normal, needle-shaped, amorphous, cone-shaped, round, or 2.1.5 framework with Tensorflow 1.13.1 as the backend. Our experi­
multi-nucleated head sperms and impurities (such as bacteria, protein ment uses a workstation with Intel(R) Core(TM) i7-9700 CPU with 3.00
clumps, and bubbles). The object sizes range from approximately 5 to 50 GHz, 32 GB RAM, and NVIDIA GEFORCE RTX 2080 8 GB.
μm2. These objects are annotated by 14 reproductive doctors and
biomedical scientists and verified by 6 reproductive doctors and 4.1.4. Hyper parameters
biomedical scientists. The purpose of object detection task is to find all objects of interest in
From 2017 to 2020, the collection and preparation of this data set the image. Therefore, this task can be regarded as a combination of
took four years, including more than 278,000 annotated objects, as positioning and classification tasks. Therefore, as the loss function of the
shown in Fig. 6. Furthermore, the data set contains some hard-to-detect network, we use the complete intersection over union [53] (CIoU)

6
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Table 1 Table 2
The definitions of evaluation metrics, where TP, TN, FP and FN represent True The memory costs, training time and FPS of TOD-CNN, YOLO-v4, YOLO-v3, SSD,
Positive, True Negative, False Positive and False Negative, respectively; N de­ RetinaNet and Faster R–CNN.
notes the number of detected objects. Model Memory Cost Training Time FPS
Metric Definition Metric Definition
TOD-CNN 164 MB 119 min 35.7
Average Precision
∑N
× Recall(i)} Recall TP YOLO-v4 244 MB 135 min 28.4
i=1 {Precision(i)
Number of Annotations TP + FN YOLO-v3 235 MB 374 min 37.0
F1 Score Precision × Recall Precision TP SSD 91.2 MB 280 min 31.5
2× RetinaNet 139 MB 503 min 21.0
Precision + Recall TP + FP
Faster R–CNN 108 MB 2753 min 7.8

of these evaluation metrics are provided in Table 1.


The metrics in Table 1 are calculated based on True Positive, True
Negative, False Positive, and False Negative. The intersection over union
(IoU) is one of the evaluation criteria for evaluating whether the
detected object is positive or negative. The calculation method of IoU is
shown in Fig. 8 and Eq. (1).

A B
IoU = ⋃
A B
(1)
(Gx − kx )(Gy − ky )
=
Gx Gy + Px Py − (Gx − kx )(Gy − ky )

From Fig. 8 and Eq. (1), it can be found that the smaller the values of
Gx, Gy, Px, and Py, the more sensitive the value of IoU to the changes of kx
and ky. The above phenomenon further illustrates that it is very difficult
to detect tiny objects and it is unfair to use IoU alone to evaluate tiny
object detection. Therefore, without affecting the sperm positioning, we
propose a more suitable evaluation index. This indicator is a positive
sample when the detected object meets two conditions at the same time:
the first is that the detected object category is correct, and the second is
that the IoU of the detection box and the ground truth box exceeds B1, or
the IoU of the detection box and the ground truth box exceeds B2, and
the distance between the center points of the two box does not exceed R
pixels.

4.3. Evaluation of sperm detection methods

In order to prove the effectiveness of the proposed TOD-CNN method


for sperm detection in sperm microscopic videos, we compared its
Fig. 8. The IoU calculation method.
detection results with other state-of-the-art methods, such as YOLO-v3
[24], YOLO-v4 [25], SSD [26], RetinaNet [27], and Faster R–CNN
function (location loss function) and the binary cross-entropy function [20]. In the experiment process, each metric is calculated under the
(confidence and classification loss function), and then minimize them by condition of B1 = 0.5, B2 = 0.45, and R = 3. B1 and B2 represent the IoU
Adam optimizer. For other hyper parameters, when freezing part of the value, and R represents the pixel distance between the center of the
layer training and unfreezing all layers, the batch size is set to 16 and 4, predicted box and the center of the ground-truth box. Among them,
the training is 50 and 100 epochs, and the learning rate is set to 1 × 10− 3 whether B1 is greater than 0.5 is a more common standard for evaluating
and 1 × 10− 4, respectively. Besides, the cosine annealing scheduler [56] positive and negative samples in the field of object detection [57]. In
is used to adjust the learning rate. Besides, when the loss value no longer addition, after our extensive experimental verification, the sperm object
drops, the training is terminated early. center coordinates obtained when B2 ≥ 0.45 and R ≤ 3 have little effect
on the sperm tracking task. Therefore, this paper adopts this standard to
4.2. Evaluation metrics evaluate the experimental results.

In order to quantitatively compare the performance of various object 4.3.1. Compare with other methods
detection methods, different metrics are used to evaluate the detection In this part, we make a comparison between TOD-CNN and some
results. Recall (Rec), Precision (Pre), F1 Score (F1), and Average Preci­ state-of-the-art methods in terms of memory costs, training time, FPS,
sion (AP) which can be used to evaluate the detection results. and detection performance.
Rec measures how many objects present in the annotation informa­
tion are correctly detected. However, we cannot judge the detection 4.3.1.1. Evaluation of memory, time costs and FPS. To compare the
result from the perspective of Rec alone. Pre measures how many objects memory costs, training time and FPS among TOD-CNN, YOLO-v4,
detected by the model exist in the annotation information. The F1 is the YOLO-v3, SSD, RetinaNet and Faster R–CNN, we provide the details in
harmonic average of model Pre and Rec, and is an metric used to mea­ Table 2.
sure model performance. AP is a metric, which is widely used to evaluate From Table 2, we can find that the memory cost of TOD-CNN is 164
the performance of object detection models. It can be obtained by MB, the training time of TOD-CNN is around 119 min for 60 sperm
calculating the area under the curve of Pre and Rec. It can evaluate microscopy videos, and the FPS is 34.7. In contrast, the memory cost and
object detection models from two aspects: Pre and Rec. The definitions FPS of TOD-CNN are not optimal, but it considers both the model size

7
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Table 3 4.3.1.2. Evaluation of sperm detection performance. TOD-CNN is


A comparison of detection results between TOD-CNN and existing models. (In compared with existing object detection models using our data set. In
[%].) Table 3, we list the comparison with the best performance results of
Models AP F1 Pre Rec various models (YOLO-v4, YOLO-v3, SSD, RetinaNet, and Faster
TOD-CNN 85.60 90.00 89.47 90.54
R–CNN). In TOD-CNN, AP is nearly 20% higher, F1 is nearly 12% higher
YOLO-v4 51.00 70.16 85.19 59.64 and Rec is nearly 22% higher. Our Pre is about 6% lower than the best
YOLO-v3 42.93 64.36 78.36 54.60 performing model (RetinaNet). It is observed that our Rec is 75% higher
SSD 65.00 78.51 93.48 67.67 than that of RetinaNet, which shows that the number of detected objects
RetinaNet 15.05 27.00 95.62 15.72
obtained by TOD-CNN far exceeds RetinaNet. Overall, TOD-CNN out­
Faster RCNN 35.76 55.28 46.57 67.99
performs existing models in sperm detection. Furthermore, a visual
comparison of the models discussed above is shown in Fig. 9.
and real-time performance. By comparing with YOLO-v3 and YOLO-v4, From Fig. 9, we can see that the correct detection case of TOD-CNN is
TOD-CNN has the minor memory cost. By comparing with RetinaNet only fewer than Faster RCNN in “sperm-lack” scenes (oligospermia), but
and Faster R–CNN, TOD-CNN has faster detection speed. By comparing our Pre is much higher than Faster RCNN [58]. In “sperm-normal”
with SSD, TOD-CNN does not have better memory cost and real-time scenes (healthy), the correct detection case of TOD-CNN is the best and
performance, but sperm detection ability of TOD-CNN is much better our Pre and Rec are higher. By observing Fig. 9, it is easy to understand
than SSD, which will be explained in detail in the next paragraph. why TOD-CNN has slightly lower Pre than SSD and RetinaNet, but other
metrics are better than other models (the number of correct detections of

Fig. 9. Comparison of TOD-CNN with YOLO-v4 [25],


YOLO-v3 [24], SSD [26], RetinaNet [27], and Faster
RCNN [58]. In these images, the blue boxes represent
the corresponding ground-truth, the green boxes
correspond to the correctly detected objects, and the
red boxes correspond to the incorrectly detected ob­
jects. The values represent the number of correctly
detected objects/the number of incorrectly detected
objects/the number of objects in the annotation in­
formation but are not detected/and the total number
of objects in the annotation information.

8
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Table 4 4.3.3. Sperm tracking


A comparison between TOD-CNN and existing models in the scene with impu­ The ultimate goal of sperm detection is to find the sperm trajectories
rity, where AP_S, AP_I and mAP represents AP of sperm, AP of impurity and and calculate the relevant parameters for clinical diagnosis. TOD-CNN
mean AP, respectively. (In [%].) and two models with better detection results (YOLO-v4 and SSD) are
Models AP_S AP_I mAP F1 Pre Rec compared for sperm tracking in Table 3. Based on the detection result of
TOD-CNN 85.60 57.33 71.47 88.57 88.41 88.74 each model, we use the kNN algorithm to match sperms in adjacent
YOLO-v4 51.00 30.00 40.50 69.61 84.76 59.06 video frames to the actual trajectories marked in Subset-B. The visual­
YOLO-v3 42.93 35.90 39.42 63.80 78.34 53.81 ization results are shown in Fig. 10. We can observe that our tracking
Faster RCNN 35.76 25.80 30.78 54.52 46.06 66.78 trajectories are very close to the actual ones, and the trajectory
SSD 65.00 18.95 41.98 76.59 92.23 65.44
RetinaNet 15.05 33.84 24.44 28.51 95.36 16.76
discontinuity or incorrect tracking is rarely occurred due to the stronger
detection capability of TOD-CNN.
In addition, we calculate three important motility parameters of
sperms on Subset-B, including the Straight Line Velocity (VSL), Curvi­
Table 5
linear Velocity (VCL) and Average Path Velocity (VAP) [59] of actual
The detection results, μ and STDEV of the five-fold cross-validation experiments.
trajectories, with TOD-CNN, SSD and Yolo-v4, respectively. Comparing
(In [%].)
with the actual trajectories, the error rates of VSL, VCL and VAP
Metrics AP F1 Pre Rec
calculated with TOD-CNN (10.15%, 5.09% and 8.95%) are significantly
1 85.60 90.00 89.47 90.54 lower than that of SSD (41.58%, 5.01% and 17.40%) and Yolo-v4
2 84.90 90.11 92.76 87.61 (12.73%, 36.12% and 19.65%). Based on VCL, VSL, and VAP, an expe­
3 88.80 92.78 95.19 90.48
rienced threshold value from a clinical doctor is set to determine
4 86.37 91.33 94.12 88.66
5 87.29 90.65 91.15 90.16 whether a sperm is motile to calculate the corresponding progressive
μ 86.59 90.97 92.54 89.49 motility (PR). The error between PR obtained by TOD-CNN tracking
STDEV 1.36 1.02 2.05 1.16 results and doctors’ diagnosis results are all within 9%. The experi­
mental result shows that our TOD-CNN can assist doctors in clinical
work.
TOD-CNN far exceeds other models).
Furthermore, to test the robustness of TOD-CNN against impurities in
4.3.4. A python-based graphical user interface
the microscopic videos, we have added 4479 impurities into the ex­
To conveniently use TOD-CNN to detect tiny objects in microscopic
periments. The experimental results are shown in Table 4, where TOD-
videos and images, we design a Python-based GUI (Fig. 11) that can help
CNN shows the best robustness against the effect of impurities compared
users to control the Intersection of Union (IoU) threshold and confidence
to other models.
according to their own needs to achieve the desired test performance.
Besides, users can load Model Path to use their own setting/weights for
4.3.2. Cross-validation experiment
tiny object detection. This GUI is compatible with videos (such as “.mp4”
To verify the reliability, stability and repeatability of TOD-CNN, we
and “.avi”) and images (such as “.png” and “.jpg”) in various formats.
have performed five-fold cross-validation. The experimental results are
shown in Table 5, where the mean values (μ) of the four evaluation
5. Conclusion and discussion
metrics is higher than 89% except for AP, and AP is higher than 86%. It
can be seen that TOD-CNN has good performance and repeatability. The
We develop and present a public, massive and high-quality data set
standard deviation (STDEV) of F1 is 1.02%, the STDEV of two of the four
for sperm detection, tracking and classification, and this data set now is
evaluation metrics are below 1.40%, and only the STDEV of Pre is
published and available online. We also provide a one-stage CNN model
slightly higher (2.05%), showing that TOD-CNN is relatively stable and
(TOD-CNN) on Subset-A for tiny object detection in real-time, which can
reliable.
accurately detect sperms in videos and images. However, TOD-CNN fails
in some cases and cannot detect sperms completely or accurately. The

Fig. 10. An example of sperm tracking results. Green lines represent the actual trajectories based ground truth; red, blue and orange lines denote the tracking
trajectories of TOD-CNN, SSD and Yolo-v4, respectively.

9
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

Fig. 11. GUI of TOD-CNN for detecting sperms in microscopic videos or images.

example of incorrect detection results are shown in Fig. 12.


In Fig. 12(a), we can see that the detection boxes can surround
sperms correctly. However, due to the small size of the ground truth
boxes, a minor position offset (one or two pixels) causes the IoU between
detection and ground truth boxes to be lower than 0.5. In Fig. 12(b), due
to the movement of the sperms, the thickness of the semen wet film and
noticeable interference fringes in sperm videos, it may lose valuable
information and lead to errors in detection. Also, because some impu­
rities have very close visual information to sperms, TOD-CNN incor­
rectly detects the impurities as sperms. In Fig. 12(c), for the sperms
appearing on figure edges, it is difficult to explore the complete infor­
mation and sometimes these sperms are missed in detection. To ensure
the annotation information reliability, when we marked sperms in
videos, we only choose sperms without controversy. In Fig. 12(d), the
detected sperms may be located deeply in the semen wet film. Because of
its unclear imaging, it is difficult to distinguish whether it is a sperm or
an impurity and it is not annotated in our data set.
In future work, we will continue to integrate related optimization
algorithms to improve the performance of TOD-CNN, such as monarch
butterfly optimization [60], earthworm optimization algorithm [61],
elephant herding optimization [62,63], moth search algorithm [64],
slime mould algorithm [65], hunger games search [66], Runge Kutta
optimizer [67], colony predation algorithm [68,69], and Harris hawks
optimization [70].

Declaration of competing interest

The authors declare that they have no conflict of interest.

Fig. 12. Visualized results of some typical detection failures of TOD-CNN. The Acknowledgements
red and green boxes represent the detection results, the blue boxes represent the
ground truth, S represents sperms and Impurity represents impurities. This work is supported by the “National Natural Science Foundation

10
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

of China” (No. 61806047). We thank Miss Zixian Li and Mr. Guoxian Li [22] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-
time object detection, in: 2016 IEEE Conference on Computer Vision and Pattern
for their important discussion.
Recognition, CVPR), 2016, pp. 779–788, https://doi.org/10.1109/CVPR.2016.91.
[23] J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: 2017 IEEE Conference
References on Computer Vision and Pattern Recognition, CVPR), 2017, pp. 6517–6525,
https://doi.org/10.1109/CVPR.2017.690.
[1] S. Gadadhar, G. Alvarez Viar, J.N. Hansen, A. Gong, A. Kostarev, C. Ialy-Radio, [24] J. Redmon, A. Farhadi, Yolov3: an incremental improvement, arXiv preprint arXiv:
S. Leboucher, M. Whitfield, A. Ziyyat, A. Touré, Tubulin glycylation controls 1804.02767, https://doi.org/10.48550/arXiv.1804.02767, 2018.
axonemal dynein activity, flagellar beat, and male fertility, Science 371 (6525) [25] A. Bochkovskiy, C.Y. Wang, H.Y.M. Liao, Yolov4: optimal speed and accuracy of
(2021), eabd4914, https://doi.org/10.1126/science.abd4914. object detection, arXiv preprint arXiv:2004.10934, https://doi.org/10.4855
[2] X. Li, C. Li, M.M. Rahaman, H. Sun, X. Li, J. Wu, Y. Yao, M. Grzegorzek, 0/arXiv.2004.10934, 2020.
A comprehensive review of computer-aided whole-slide image analysis: from [26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, A.C. Berg, Ssd: single
datasets to feature extraction, segmentation, classification and detection shot multibox detector, in: Computer Vision – ECCV 2016, 2016, pp. 21–37,
approaches, Artif. Intell. Rev. (2022) 1–70, https://doi.org/10.1007/s10462-021- https://doi.org/10.1007/978-3-319-46448-0_2.
10121-0. [27] T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object
[3] Y. Li, X. Wu, C. Li, X. Li, H. Chen, C. Sun, M.M. Rahaman, Y. Yao, Y. Zhang, detection, in: 2017 IEEE International Conference on Computer Vision, ICCV),
T. Jiang, A hierarchical conditional random field-based attention mechanism 2017, pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324.
approach for gastric histopathology image classification, Appl. Intell. (2022) 1–22, [28] B. Gu, R. Ge, Y. Chen, L. Luo, G. Coatrieux, Automatic and robust object detection
https://doi.org/10.1007/s10489-021-02886-2. in x-ray baggage inspection using deep convolutional neural networks, IEEE Trans.
[4] Y. Li, C. Li, X. Li, K. Wang, M.M. Rahaman, C. Sun, H. Chen, X. Wu, H. Zhang, Ind. Electron. 68 (10) (2021) 10248–10257, https://doi.org/10.1109/
Q. Wang, A comprehensive review of markov random field and conditional random TIE.2020.3026285.
field approaches in pathology image analysis, Arch. Comput. Methods Eng. 29 (1) [29] L. Wang, M. Shen, C. Shi, Y. Zhou, Y. Chen, J. Pu, H. Chen, Ee-net: an edge-
(2022) 609–639, https://doi.org/10.1007/s11831-021-09591-w. enhanced deep learning network for jointly identifying corneal micro-layers from
[5] C. Li, H. Chen, X. Li, N. Xu, Z. Hu, D. Xue, S. Qi, H. Ma, L. Zhang, H. Sun, A review optical coherence tomography, Biomed. Signal Process Control 71 (2022) 103213,
for cervical histopathology image analysis using machine vision approaches, Artif. https://doi.org/10.1016/j.bspc.2021.103213.
Intell. Rev. 53 (7) (2020) 4821–4862, https://doi.org/10.1007/s10462-020- [30] W. Yang, H. Zhang, J. Yang, J. Wu, X. Yin, Y. Chen, H. Shu, L. Luo, G. Coatrieux,
09808-7. Z. Gui, Q. Feng, Improving low-dose ct image using residual convolutional
[6] M.M. Rahaman, C. Li, Y. Yao, F. Kulwa, X. Wu, X. Li, Q. Wang, Deepcervix: a deep network, IEEE Access 5 (2017) 24698–24705, https://doi.org/10.1109/
learning-based framework for the classification of cervical cells using hybrid deep ACCESS.2017.2766438.
feature fusion techniques, Comput. Biol. Med. 136 (2021) 104649, https://doi.org/ [31] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, S. Yan, Perceptual generative adversarial
10.1016/j.compbiomed.2021.104649. networks for small object detection, in: 2017 IEEE Conference on Computer Vision
[7] M.M. Rahaman, C. Li, X. Wu, Y. Yao, Z. Hu, T. Jiang, X. Li, S. Qi, A survey for and Pattern Recognition, CVPR), 2017, pp. 1951–1959, https://doi.org/10.1109/
cervical cytopathology image analysis using deep learning, IEEE Access 8 (2020) CVPR.2017.211.
61687–61710, https://doi.org/10.1109/ACCESS.2020.2983186. [32] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
[8] M.M. Rahaman, C. Li, Y. Yao, F. Kulwa, M.A. Rahman, Q. Wang, S. Qi, F. Kong, Syst. Man Cybernet. 9 (1) (1979) 62–66, https://doi.org/10.1109/
X. Zhu, X. Zhao, Identification of covid-19 samples from chest x-ray images using TSMC.1979.4310076.
deep learning: a comparison of transfer learning approaches, J. X Ray Sci. Technol. [33] X. Zhou, Y. Lu, Efficient mean shift particle filter for sperm cells tracking, in: 2009
28 (5) (2020) 821–839, https://doi.org/10.3233/XST-200715. International Conference on Computational Intelligence and Security, 2009,
[9] C. Li, J. Zhang, F. Kulwa, S. Qi, Z. Qi, A sars-cov-2 microscopic image dataset with pp. 335–339, https://doi.org/10.1109/CIS.2009.264.
ground truth images and visual ffatures, in: Pattern Recognition and Computer [34] E. Soubiès, P. Weiss, X. Descombes, A 3d segmentation algorithm for ellipsoidal
Vision, PRCV), 2020, pp. 244–255, https://doi.org/10.1007/978-3-030-60633-6_ shapes. application to nuclei extraction, in: ICPRAM-international Conference on
20. Pattern Recognition Applications and Methods, 2013, pp. 97–105. https://hal.arch
[10] J. Zhang, C. Li, M.M. Rahaman, Y. Yao, P. Ma, J. Zhang, X. Zhao, T. Jiang, ives-ouvertes.fr/hal-00733187.
M. Grzegorzek, A comprehensive review of image analysis methods for [35] M.R. Ravanfar, M.H. Moradi, Low contrast sperm detection and tracking by
microorganism counting: from classical image processing to deep learning watershed algorithm and particle filter, in: 2011 18th Iranian Conference of
approaches, Artif. Intell. Rev. (2021) 1–70, https://doi.org/10.1007/s10462-021- Biomedical Engineering, ICBME), 2011, pp. 260–263, https://doi.org/10.1109/
10082-4. ICBME.2011.6168568.
[11] W. Zhao, P. Ma, C. Li, X. Bu, S. Zou, T. Jang, M. Grzegorzek, A survey of semen [36] A. Nurhadiyatna, A.L. Latifah, D. Fryantoni, T. Wirahman, R. Wijayanti, F.
quality evaluation in microscopic videos using computer assisted sperm analysis, H. Muttaqien, Comparison and implementation of motion detection methods for
arXiv preprint arXiv:2202.07820, https://doi.org/10.48550/arXiv.2202.07820, sperm detection and tracking, in: 2014 International Symposium on Micro-
2022. NanoMechatronics and Human Science, MHS), 2014, pp. 1–5, https://doi.org/
[12] W. Zhao, S. Zou, C. Li, J. Li, J. Zhang, P. Ma, Y. Gu, P. Xu, X. Bu, A survey of sperm 10.1109/MHS.2014.7006125.
detection techniques in microscopic videos, in: The Fourth International [37] M. Berezansky, H. Greenspan, D. Cohen-Or, O. Eitan, Segmentation and tracking of
Symposium on Image Computing and Digital Medicine, 2020, pp. 219–224, human sperm cells using spatio-temporal representation and clustering, in: Medical
https://doi.org/10.1145/3451421.3451467. Imaging 2007: Image Processing, 2007, pp. 891–902, https://doi.org/10.1117/
[13] M. Elsayed, T.M. El-Sherry, M. Abdelgawad, Development of computer-assisted 12.708887.
sperm analysis plugin for analyzing sperm motion in microfluidic environments [38] L.Z. Shi, J. Nascimento, C. Chandsawangbhuwana, M.W. Berns, E.L. Botvinick,
using image-j, Theriogenology 84 (8) (2015) 1367–1377, https://doi.org/ Real-time automated tracking and trapping system for sperm, Microsc. Res. Tech.
10.1016/j.theriogenology.2015.07.021. 69 (11) (2006) 894–902, https://doi.org/10.1002/jemt.20359.
[14] L.F. Urbano, P. Masson, M. VerMilyea, M. Kam, Automatic tracking and motility [39] G.-G. Wang, M. Lu, Y.-Q. Dong, X.-J. Zhao, Self-adaptive extreme learning
analysis of human sperm in time-lapse images, IEEE Trans. Med. Imag. 36 (3) machine, Neural Comput. Appl. 27 (2) (2016) 291–303, https://doi.org/10.1007/
(2017) 792–801, https://doi.org/10.1109/TMI.2016.2630720. s00521-015-1874-3.
[15] X. Li, C. Li, F. Kulwa, M.M. Rahaman, W. Zhao, X. Wang, D. Xue, Y. Yao, Y. Cheng, [40] J.-H. Yi, J. Wang, G.-G. Wang, Improved probabilistic neural networks with self-
J. Li, S. Qi, T. Jiang, Foldover features for dynamic object behaviour description in adaptive strategies for transformer fault diagnosis problem, Adv. Mech. Eng. 8 (1)
microscopic videos, IEEE Access 8 (2020) 114519–114540, https://doi.org/ (2016), 1687814015624832, https://doi.org/10.1177/1687814015624832.
10.1109/ACCESS.2020.3003993. [41] S. Kosov, K. Shirahama, C. Li, M. Grzegorzek, Environmental microorganism
[16] H. Yang, X. Descombes, S. Prigent, G. Malandain, X. Druart, F. Plouraboué, Head classification using conditional random fields and deep convolutional neural
tracking and flagellum tracing for sperm motility analysis, in: 2014 IEEE 11th networks, Pattern Recogn. 77 (2018) 248–261, https://doi.org/10.1016/j.
International Symposium on Biomedical Imaging, ISBI), 2014, pp. 310–313, patcog.2017.12.021.
https://doi.org/10.1109/ISBI.2014.6867871. [42] C. Li, K. Shirahama, M. Grzegorzek, Application of content-based image analysis to
[17] Z. Zou, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: a survey, arXiv preprint environmental microorganism classification, Biocybern. Biomed. Eng. 35 (1)
arXiv:1905.05055, https://doi.org/10.48550/arXiv.1905.05055, 2019. (2015) 10–21, https://doi.org/10.1016/j.bbe.2014.07.003.
[18] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate [43] J. Zhang, C. Li, S. Kosov, M. Grzegorzek, K. Shirahama, T. Jiang, C. Sun, Z. Li, H. Li,
object detection and semantic segmentation, in: 2014 IEEE Conference on Lcu-net: a novel low-cost u-net for environmental microorganism image
Computer Vision and Pattern Recognition, CVPR), 2014, pp. 580–587, https://doi. segmentation, Pattern Recogn. 115 (2021), https://doi.org/10.1016/j.
org/10.1109/CVPR.2014.81. patcog.2021.107885, 107885.
[19] R. Girshick, Fast r-cnn, in: 2015 IEEE International Conference on Computer Vision [44] F. Kulwa, C. Li, X. Zhao, B. Cai, N. Xu, S. Qi, S. Chen, Y. Teng, A state-of-the-art
(ICCV), 2015, pp. 1440–1448, https://doi.org/10.1109/ICCV.2015.169. survey for microorganism image segmentation methods and future potential, IEEE
[20] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection Access 7 (2019) 100243–100269, https://doi.org/10.1109/
with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell. 39 (6) ACCESS.2019.2930111.
(2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031. [45] C. Sun, C. Li, J. Zhang, M.M. Rahaman, S. Ai, H. Chen, F. Kulwa, Y. Li, X. Li,
[21] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: 2017 IEEE International T. Jiang, Gastric histopathology image segmentation using a hierarchical
Conference on Computer Vision (ICCV), 2017, pp. 2980–2988, https://doi.org/ conditional random field, Biocybern. Biomed. Eng. 40 (4) (2020) 1535–1555,
10.1109/ICCV.2017.322. https://doi.org/10.1016/j.bbe.2020.09.008.

11
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543

[46] Z. Cui, F. Xue, X. Cai, Y. Cao, G.-g. Wang, J. Chen, Detection of malicious code [58] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection
variants based on deep learning, IEEE Trans. Ind. Inf. 14 (7) (2018) 3187–3196, with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell. 39 (6)
https://doi.org/10.1109/TII.2018.2822680. (2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031.
[47] M. Shen, C. Li, W. Huang, P. Szyszka, K. Shirahama, M. Grzegorzek, D. Merhof, [59] M. O’connell, N. Mcclure, S. Lewis, The effects of cryopreservation on sperm
O. Deussen, Interactive tracking of insect posture, Pattern Recogn. 48 (11) (2015) morphology, motility and mitochondrial function, Hum. Reprod. 17 (3) (2002)
3560–3571, https://doi.org/10.1016/j.patcog.2015.05.011. 704–709, https://doi.org/10.1093/humrep/17.3.704.
[48] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid [60] G.-G. Wang, S. Deb, Z. Cui, Monarch butterfly optimization, Neural Comput. Appl.
networks for object detection, in: 2017 IEEE Conference on Computer Vision and 31 (7) (2019) 1995–2014, https://doi.org/10.1007/s00521-015-1923-y.
Pattern Recognition, CVPR), 2017, pp. 936–944, https://doi.org/10.1109/ [61] G.-G. Wang, S. Deb, L.D.S. Coelho, Earthworm optimisation algorithm: a bio-
CVPR.2017.106. inspired metaheuristic algorithm for global optimisation problems, Int. J. Bio-
[49] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional Inspired Comput. 12 (1) (2018) 1–22, https://doi.org/10.1504/
networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) IJBIC.2018.093328.
(2015) 1904, https://doi.org/10.1109/TPAMI.2015.2389824. –1916. [62] G.-G. Wang, S. Deb, L.d.S. Coelho, Elephant herding optimization, in: 2015 3rd
[50] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: International Symposium on Computational and Business Intelligence, ISCBI),
2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR), 2016, 2015, pp. 1–5, https://doi.org/10.1109/ISCBI.2015.8.
pp. 770–778, https://doi.org/10.1109/CVPR.2016.90. [63] J. Li, H. Lei, A.H. Alavi, G.-G. Wang, Elephant herding optimization: variants,
[51] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale hybrids, and applications, Mathematics 8 (9) (2020) 1415, https://doi.org/
image recognition, arXiv preprint arXiv:1409.1556, https://doi.org/10.48 10.3390/math8091415.
550/arXiv.1409.1556, 2015. [64] G.-G. Wang, Moth search algorithm: a bio-inspired metaheuristic algorithm for
[52] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception global optimization problems, Memetic Comput. 10 (2) (2018) 151–164, https://
architecture for computer vision, in: 2016 IEEE Conference on Computer Vision doi.org/10.1007/s12293-016-0212-3.
and Pattern Recognition, CVPR), 2016, pp. 2818–2826, https://doi.org/10.1109/ [65] S. Li, H. Chen, M. Wang, A.A. Heidari, S. Mirjalili, Slime mould algorithm: a new
CVPR.2016.308. method for stochastic optimization, Future Generat. Comput. Syst. 111 (2020)
[53] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-iou loss: faster and better 300–323, https://doi.org/10.1016/j.future.2020.03.055.
learning for bounding box regression, in: Proceedings of the AAAI Conference on [66] Y. Yang, H. Chen, A.A. Heidari, A.H. Gandomi, Hunger games search: visions,
Artificial Intelligence, 2020, pp. 12993–13000, https://doi.org/10.1609/aaai. conception, implementation, deep analysis, perspectives, and towards performance
v34i07.6999. shifts, Expert Syst. Appl. 177 (2021) 114864, https://doi.org/10.1016/j.
[54] A. Chen, C. Li, S. Zou, M.M. Rahaman, Y. Yao, H. Chen, H. Yang, P. Zhao, W. Hu, eswa.2021.114864.
W. Liu, G. Marcin, Svia dataset: a new dataset of microscopic videos and images for [67] W. Li, G.-G. Wang, A.H. Gandomi, A survey of learning-based intelligent
computer-aided sperm analysis, Biocybern. Biomed. Eng. 42 (1) (2022) 204–214, optimization algorithms, Arch. Comput. Methods Eng. 28 (5) (2021) 3781–3799,
https://doi.org/10.1016/j.bbe.2021.12.010. https://doi.org/10.1007/s11831-021-09562-1.
[55] Y. Hu, J. Lu, Y. Shao, Y. Huang, N. Lü, Comparison of the semen analysis results [68] J. Tu, H. Chen, M. Wang, A.H. Gandomi, The colony predation algorithm, JBE 18
obtained from two branded computer-aided sperm analysis systems, Andrologia 45 (3) (2021) 674–710, https://doi.org/10.1007/s42235-021-0050-y.
(5) (2013) 315–318, https://doi.org/10.1111/and.12010. [69] M. Li, G.-G. Wang, A review of green shop scheduling problem, Inf. Sci. 589 (2022)
[56] I. Loshchilov, F. Hutter, Sgdr: stochastic gradient descent with warm restarts, arXiv 478–496, https://doi.org/10.1016/j.ins.2021.12.122.
preprint arXiv:1608.03983, https://doi.org/10.48550/arXiv.1608.03983, 2017. [70] A.A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, H. Chen, Harris hawks
[57] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep optimization: algorithm and applications, Future Generat. Comput. Syst. 97 (2019)
learning for generic object detection: a survey, Int. J. Comput. Vis. 128 (2) (2020) 849–872, https://doi.org/10.1016/j.future.2019.02.028.
261–318, https://doi.org/10.1007/s11263-019-01247-4.

12

View publication stats

You might also like