TOD-CNN An Effective Convolutional Neural Network For Tiny Object
TOD-CNN An Effective Convolutional Neural Network For Tiny Object
net/publication/360135349
CITATIONS READS
3 36
9 authors, including:
Marcin Grzegorzek
Universität zu Lübeck
235 PUBLICATIONS 1,774 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jiawei Zhang on 05 July 2022.
A R T I C L E I N F O A B S T R A C T
Keywords: The detection of tiny objects in microscopic videos is a problematic point, especially in large-scale experiments.
Image analysis For tiny objects (such as sperms) in microscopic videos, current detection methods face challenges in fuzzy,
Object detection irregular, and precise positioning of objects. In contrast, we present a convolutional neural network for tiny
Convolutional neural network
object detection (TOD-CNN) with an underlying data set of high-quality sperm microscopic videos (111 videos, >
Sperm microscopy video
278,000 annotated objects), and a graphical user interface (GUI) is designed to employ and test the proposed
model effectively. TOD-CNN is highly accurate, achieving 85.60% AP50 in the task of real-time sperm detection
in microscopic videos. To demonstrate the importance of sperm detection technology in sperm quality analysis,
we carry out relevant sperm quality evaluation metrics and compare them with the diagnosis results from
medical doctors.
* Corresponding author.
E-mail address: [email protected] (C. Li).
https://doi.org/10.1016/j.compbiomed.2022.105543
Received 1 March 2022; Received in revised form 30 March 2022; Accepted 17 April 2022
Available online 22 April 2022
0010-4825/© 2022 Elsevier Ltd. All rights reserved.
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Fig. 1. The workflow of the proposed sperm object detection method using TOD-CNN.
and category information of sperms and impurities. (b) Data Pre through modelling. Filtering methods: Ravanfar et al. [35] select several
processing: The sperm microscopic video is divided into frames to obtain suitable structural elements firstly, and then the operation based on
one by one sperm microscopic images, and the object information is Top-hat is used to filter the image sequence to achieve the purpose of
annotated by using LabelImg software. (c) Training Process: The TOD- separating sperm and other debris. Nurhadiyatna et al. [36] use the
CNN model is trained and the best model is saved to perform sperm Gaussian Mixture Model (GMM) enhanced by the Hole Filling Algorithm
object detection. (d) Test Data: The test data contains 21 sperm micro as the probability density function to predict the probability of each
scopic videos. pixel in the image belongs to the foreground and the background. The
The main contributions of this paper are as follows: researchers found that the calculation amount of this method is signif
icantly less than other methods.
● Build an easy-to-operate CNN for sperm detection, namely TOD-CNN
(Convolutional Neural Network for tiny object detection). 2.1.2. Machine learning methods
● TOD-CNN has excellent detection results and real-time detection The unsupervised learning method is the most used machine learning
ability in the task of tiny object detection in sperm microscopic method. Berezansky et al. [37] use the Spatio-Temporal Segmentation to
video, achieving 85.60% AP50 and 35.7 frames per second (FPS). detect sperm by segmentation, integrating k-means, GMM, mean shift,
and other segmentation methods. Shi et al. [38] use the optical capture
The structure of this paper is as follows: Section 2 introduces the method for sperm detection.
existing sperm object detection methods based on traditional methods,
machine learning methods, and deep learning methods. Section 3 il
lustrates the detailed design of TOD-CNN. Section 4 introduces the data 2.2. Deep learning based object detection methods
set used in the experiment, experiment settings, evaluation methods,
and results. Chapter 5 is conclusion. Deep learning methods are widely used in many artificial intelligent
fields, for example classification [39–42], segmentation [43–45] and
2. Related work object detection [46,47]. Furthermore, some widely recognized general
object detection models that have been proposed in recent years are
2.1. Existing sperm object detection methods introduced bellow.
2
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
generates the final prediction result by fusing the prediction results of 6 addition, Darknet53 follows the idea of ResNet and adds a residual
feature maps. module to the network to solve the vanishing gradient problem of the
deep network. The neck: The neck of the YOLO-v3 model draws on the
2.2.2. Two-stage object detector idea of the feature pyramid network [48] to enrich the information of
The models based on R–CNN series are classical two-stage object the feature map. The head: The head of YOLO-v3 outputs 3 feature maps
detectors. The R–CNN [18] generates proposal regions through the se with different sizes and then detects large, medium, and small size ob
lective search algorithm. Then the features of the proposal regions are jects of the three feature maps with three sizes.
extracted by using CNN. Finally, the SVM classifier is used to predict the
objects in each region and identify the category of the objects. The Fast 3.1.2. Basic knowledge of ResNet
R–CNN [19] no longer extracts features for each proposal region. The ResNet [50] is one of the most widely used feature extraction CNNs
features of entire image is extracted using CNN, then each proposal re due to its practical and straightforward structure. With the continuous
gion and corresponding features are mapped. Besides, the Fast R–CNN deepening of CNN, the model’s performance cannot be continuously
uses a multi-task loss function, allowing us to train the detector and improved, and the accuracy may even decrease. However, ResNet pro
bounding box regressor simultaneously. The Faster R–CNN [20] replaces poses the Shortcut Connection structure to solve the problems above.
the selective search algorithm with Region Proposal Network, which can The identity mapping operation and residual mapping operation are
help CNN to generate proposal regions and detect objects included in the Shortcut Connection structure. The identity mapping is
simultaneously. to pass the current feature map backward through cross-layer transfer
(when the dimension of feature map does not match, a 1 × 1 convolution
3. TOD-CNN based sperm detection method in microscopic operation is used to adjust the dimension of feature map). Residual
image mapping is to pass the current feature map to the next layer after
convolution operation. A Shortcut Connection structure contains one
Sperm detection is always the first step in a CASA system, which identity mapping operation and two or three residual mapping opera
determines the reliability of the results of sperm microscopic video tions in general.
analysis. However, the existing algorithms cannot accurately detect
sperms. Therefore, we follow the idea of YOLO [22–25], ResNet [50], 3.1.3. Basic knowledge of Inception-v3 and VGG16
Inception-v3 [52], and VGG16 [51] models and propose a novel In Inception-v3 [52], to reduce the parameters and ensure the per
one-stage deep learning based sperm object detection model formance of the model, an operation that replaces N × N convolution
(TOD-CNN). The workflow of the proposed TOD-CNN detection kernels with 1 × N and N × 1 convolution kernels is proposed. The
approach is shown in Fig. 1. receptive fields of 1 × N and N × 1 convolution kernels and N × N
convolution kernels are the same, where the former has less parameters
than the latter. In addition, the Inception-v3 model can support
3.1. Basic knowledge multi-scale input, which can use convolution kernels with different sizes
to perform convolution operations on the input images, and then the
In this section, the methods related to our work are introduced, input feature maps can be connected to generate the final feature map.
including YOLO, ResNet, Inception-v3, and VGG16 models. VGG16 [51] model includes 13 convolutional layers, 3 fully con
nected layers, and 5 maxpooling layers. The most prominent feature of
3.1.1. Basic knowledge of YOLO the VGG16 model is its simple structure. All convolutional layers use the
YOLO series models solve the object detection task as a regression same convolution kernel parameters, and all pooling layers use the same
problem. The YOLO series models remove the step of generating the pooling kernel parameters. Although the VGG16 model has a simple
proposal region in the two-stage object detector and accelerate the structure, it has strong feature extraction capabilities.
detection process. YOLO-v3 [24] is the most popular model in the YOLO
series models due to its excellent detection performance and speed.
The YOLO-v3 model mainly consists of four parts, which are pre 3.2. The structure of TOD-CNN
processing, backbone, neck, and head. The preprocessing: The k-means
algorithm is used to cluster nine anchor boxes in the data set before The TOD-CNN model, which refers to the YOLO series model, can
training. The backbone: YOLO-v3 uses Darknet53 as the backbone regard the object detection task as a regression problem for fast and
network of the model. Darknet53 does not have maxpooling layers and precise detection. The architecture of TOD-CNN is shown in Fig. 2,
fully connected layers. The fully convolutional network can change the where the entire network is composed of four parts: Data preprocessing,
size of the tensor by changing the strides of the convolutional kernel. In backbone of the network, neck of the network and head of the network.
3
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
The detailed implementation of each part is introduced in detail below. deleting images whose Otsu threshold is less than a certain threshold
(from articial experience). In addition, TOD-CNN is an anchor-based
3.2.1. Data Preprocessing object detection model. Therefore, the k-means algorithm is used to
The object detection task in the video is essentially based on image cluster a certain number (TOD-CNN uses six) of anchor boxes in data set
processing. Therefore, it is necessary to split the sperm microscopic to train the model.
video into continuous frames (single images). However, due to the
movement of the lens during the sperm microscopic video shooting 3.2.2. The backbone of TOD-CNN
process, there are some blurred frames in the sperm microscopic video. A straight forward backbone structure with cross-layer concatenate
After analysing the grayscale histogram of frames, there is an obvious operation is designed, which is shown in Fig. 3. However, in a fully
difference between the grayscale distribution of the blurred frame and convolutional network, as the structures of CNNs continue to deepen,
the normal frame. Therefore, the blurred frames can be solved by the semantic information of the feature map becomes more and more
4
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Fig. 6. An example of the sperm video data set. (a) The first row shows a frame in a sperm microscopic video and the bottom row is the corresponding annotation for
object detection tasks. Sperms are in green boxes and impurities are in red boxes. (b) The first row shows a frame in sperm microscopic video and the bottom row
shows the corresponding ground truth for object tracking tasks. (c) The first row shows individual sperm images and the bottom row shows individual impurity
images for classification tasks.
5
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Fig. 7. Challenging cases for the detection of sperms. The positions pointed by the red arrow are the blur of sperm imaging caused by sperm movement and the
impurity similar to sperm.
3.2.4. The head of TOD-CNN objects, such as uncertain morphology sperm, low contrast sperm, and
In the head of TOD-CNN, 6 bounding boxes are predicted for each similar impurities (as show in Fig. 7), which greatly increases the dif
cell in the output feature map. For each bounding box, 7 coordinates (tx, ficulty of tiny object detection.
ty, tw, th, C, P0 and P1) are predicted, so the dimension of the predicted In this data set, Subset-A provides more than 125,000 objects with
result in Fig. 5 is 42. For each cell, the offset from the upper left corner of bounding box annotation and category information in 101 videos for
the image is assumed to be (Cx, Cy), and the width and height of the tiny object detection task; Subset-B segments more than 26,000 sperms
corresponding a priori box are Pw and Ph. The calculation method of in 10 videos as ground truth for tiny object tracking task; Subset-C
center coordinates (bx and by), width (bw) and height (bh) of predicted provides more than 125,000 independent images of sperms and impu
box is shown in Fig. 5. Multi-label classification is applied to predict the rities for tiny object classification task. Although Subset-C is not used in
categories in each bounding box. Furthermore, due to the dense pre this work, it is still openly available to non-commercial scientific work.
diction method is applied to the head of TOD-CNN, the non-maximum
value suppression method based on distance intersection over union 4.1.2. Training, validation, and test data setting
[53] is used to remove bounding boxes with high overlap in the output We randomly divide the sperm microscopic video into training,
results of the network. validation, and test data sets at a ratio of 6:2:2. Therefore, we have 80
sperm microscopic videos and corresponding annotation information for
4. Experiments training, and validation. The training set includes 2125 sperm micro
scopic images (77522 sperms and 2759 impurities), and validation set
4.1. Experimental settings includes 668 sperm microscopic images (23173 sperms and 490 impu
rities). And we have 21 sperm microscopic videos for testing, the test set
4.1.1. Data set includes 829 sperm microscopic images (20706 sperms and 1230
A sperm microscopic video data set is released in our previous work impurities).
[54] and it is used for the experiments of this paper. These sperm
microscopic videos in the data set are obtained by a WLJY-9000 com 4.1.3. Experimental environment
puter-aided sperm analysis system [55] under a 20 × objective lens and The experiment is conducted by Python 3.7.0 in Windows 10 oper
a 20 × electronic eyepiece. More than 278,000 objects are annotated in ating system. The models we use in this paper are implemented by Keras
the data set: normal, needle-shaped, amorphous, cone-shaped, round, or 2.1.5 framework with Tensorflow 1.13.1 as the backend. Our experi
multi-nucleated head sperms and impurities (such as bacteria, protein ment uses a workstation with Intel(R) Core(TM) i7-9700 CPU with 3.00
clumps, and bubbles). The object sizes range from approximately 5 to 50 GHz, 32 GB RAM, and NVIDIA GEFORCE RTX 2080 8 GB.
μm2. These objects are annotated by 14 reproductive doctors and
biomedical scientists and verified by 6 reproductive doctors and 4.1.4. Hyper parameters
biomedical scientists. The purpose of object detection task is to find all objects of interest in
From 2017 to 2020, the collection and preparation of this data set the image. Therefore, this task can be regarded as a combination of
took four years, including more than 278,000 annotated objects, as positioning and classification tasks. Therefore, as the loss function of the
shown in Fig. 6. Furthermore, the data set contains some hard-to-detect network, we use the complete intersection over union [53] (CIoU)
6
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Table 1 Table 2
The definitions of evaluation metrics, where TP, TN, FP and FN represent True The memory costs, training time and FPS of TOD-CNN, YOLO-v4, YOLO-v3, SSD,
Positive, True Negative, False Positive and False Negative, respectively; N de RetinaNet and Faster R–CNN.
notes the number of detected objects. Model Memory Cost Training Time FPS
Metric Definition Metric Definition
TOD-CNN 164 MB 119 min 35.7
Average Precision
∑N
× Recall(i)} Recall TP YOLO-v4 244 MB 135 min 28.4
i=1 {Precision(i)
Number of Annotations TP + FN YOLO-v3 235 MB 374 min 37.0
F1 Score Precision × Recall Precision TP SSD 91.2 MB 280 min 31.5
2× RetinaNet 139 MB 503 min 21.0
Precision + Recall TP + FP
Faster R–CNN 108 MB 2753 min 7.8
From Fig. 8 and Eq. (1), it can be found that the smaller the values of
Gx, Gy, Px, and Py, the more sensitive the value of IoU to the changes of kx
and ky. The above phenomenon further illustrates that it is very difficult
to detect tiny objects and it is unfair to use IoU alone to evaluate tiny
object detection. Therefore, without affecting the sperm positioning, we
propose a more suitable evaluation index. This indicator is a positive
sample when the detected object meets two conditions at the same time:
the first is that the detected object category is correct, and the second is
that the IoU of the detection box and the ground truth box exceeds B1, or
the IoU of the detection box and the ground truth box exceeds B2, and
the distance between the center points of the two box does not exceed R
pixels.
In order to quantitatively compare the performance of various object 4.3.1. Compare with other methods
detection methods, different metrics are used to evaluate the detection In this part, we make a comparison between TOD-CNN and some
results. Recall (Rec), Precision (Pre), F1 Score (F1), and Average Preci state-of-the-art methods in terms of memory costs, training time, FPS,
sion (AP) which can be used to evaluate the detection results. and detection performance.
Rec measures how many objects present in the annotation informa
tion are correctly detected. However, we cannot judge the detection 4.3.1.1. Evaluation of memory, time costs and FPS. To compare the
result from the perspective of Rec alone. Pre measures how many objects memory costs, training time and FPS among TOD-CNN, YOLO-v4,
detected by the model exist in the annotation information. The F1 is the YOLO-v3, SSD, RetinaNet and Faster R–CNN, we provide the details in
harmonic average of model Pre and Rec, and is an metric used to mea Table 2.
sure model performance. AP is a metric, which is widely used to evaluate From Table 2, we can find that the memory cost of TOD-CNN is 164
the performance of object detection models. It can be obtained by MB, the training time of TOD-CNN is around 119 min for 60 sperm
calculating the area under the curve of Pre and Rec. It can evaluate microscopy videos, and the FPS is 34.7. In contrast, the memory cost and
object detection models from two aspects: Pre and Rec. The definitions FPS of TOD-CNN are not optimal, but it considers both the model size
7
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
8
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Fig. 10. An example of sperm tracking results. Green lines represent the actual trajectories based ground truth; red, blue and orange lines denote the tracking
trajectories of TOD-CNN, SSD and Yolo-v4, respectively.
9
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
Fig. 11. GUI of TOD-CNN for detecting sperms in microscopic videos or images.
Fig. 12. Visualized results of some typical detection failures of TOD-CNN. The Acknowledgements
red and green boxes represent the detection results, the blue boxes represent the
ground truth, S represents sperms and Impurity represents impurities. This work is supported by the “National Natural Science Foundation
10
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
of China” (No. 61806047). We thank Miss Zixian Li and Mr. Guoxian Li [22] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-
time object detection, in: 2016 IEEE Conference on Computer Vision and Pattern
for their important discussion.
Recognition, CVPR), 2016, pp. 779–788, https://doi.org/10.1109/CVPR.2016.91.
[23] J. Redmon, A. Farhadi, Yolo9000: better, faster, stronger, in: 2017 IEEE Conference
References on Computer Vision and Pattern Recognition, CVPR), 2017, pp. 6517–6525,
https://doi.org/10.1109/CVPR.2017.690.
[1] S. Gadadhar, G. Alvarez Viar, J.N. Hansen, A. Gong, A. Kostarev, C. Ialy-Radio, [24] J. Redmon, A. Farhadi, Yolov3: an incremental improvement, arXiv preprint arXiv:
S. Leboucher, M. Whitfield, A. Ziyyat, A. Touré, Tubulin glycylation controls 1804.02767, https://doi.org/10.48550/arXiv.1804.02767, 2018.
axonemal dynein activity, flagellar beat, and male fertility, Science 371 (6525) [25] A. Bochkovskiy, C.Y. Wang, H.Y.M. Liao, Yolov4: optimal speed and accuracy of
(2021), eabd4914, https://doi.org/10.1126/science.abd4914. object detection, arXiv preprint arXiv:2004.10934, https://doi.org/10.4855
[2] X. Li, C. Li, M.M. Rahaman, H. Sun, X. Li, J. Wu, Y. Yao, M. Grzegorzek, 0/arXiv.2004.10934, 2020.
A comprehensive review of computer-aided whole-slide image analysis: from [26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, A.C. Berg, Ssd: single
datasets to feature extraction, segmentation, classification and detection shot multibox detector, in: Computer Vision – ECCV 2016, 2016, pp. 21–37,
approaches, Artif. Intell. Rev. (2022) 1–70, https://doi.org/10.1007/s10462-021- https://doi.org/10.1007/978-3-319-46448-0_2.
10121-0. [27] T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object
[3] Y. Li, X. Wu, C. Li, X. Li, H. Chen, C. Sun, M.M. Rahaman, Y. Yao, Y. Zhang, detection, in: 2017 IEEE International Conference on Computer Vision, ICCV),
T. Jiang, A hierarchical conditional random field-based attention mechanism 2017, pp. 2999–3007, https://doi.org/10.1109/ICCV.2017.324.
approach for gastric histopathology image classification, Appl. Intell. (2022) 1–22, [28] B. Gu, R. Ge, Y. Chen, L. Luo, G. Coatrieux, Automatic and robust object detection
https://doi.org/10.1007/s10489-021-02886-2. in x-ray baggage inspection using deep convolutional neural networks, IEEE Trans.
[4] Y. Li, C. Li, X. Li, K. Wang, M.M. Rahaman, C. Sun, H. Chen, X. Wu, H. Zhang, Ind. Electron. 68 (10) (2021) 10248–10257, https://doi.org/10.1109/
Q. Wang, A comprehensive review of markov random field and conditional random TIE.2020.3026285.
field approaches in pathology image analysis, Arch. Comput. Methods Eng. 29 (1) [29] L. Wang, M. Shen, C. Shi, Y. Zhou, Y. Chen, J. Pu, H. Chen, Ee-net: an edge-
(2022) 609–639, https://doi.org/10.1007/s11831-021-09591-w. enhanced deep learning network for jointly identifying corneal micro-layers from
[5] C. Li, H. Chen, X. Li, N. Xu, Z. Hu, D. Xue, S. Qi, H. Ma, L. Zhang, H. Sun, A review optical coherence tomography, Biomed. Signal Process Control 71 (2022) 103213,
for cervical histopathology image analysis using machine vision approaches, Artif. https://doi.org/10.1016/j.bspc.2021.103213.
Intell. Rev. 53 (7) (2020) 4821–4862, https://doi.org/10.1007/s10462-020- [30] W. Yang, H. Zhang, J. Yang, J. Wu, X. Yin, Y. Chen, H. Shu, L. Luo, G. Coatrieux,
09808-7. Z. Gui, Q. Feng, Improving low-dose ct image using residual convolutional
[6] M.M. Rahaman, C. Li, Y. Yao, F. Kulwa, X. Wu, X. Li, Q. Wang, Deepcervix: a deep network, IEEE Access 5 (2017) 24698–24705, https://doi.org/10.1109/
learning-based framework for the classification of cervical cells using hybrid deep ACCESS.2017.2766438.
feature fusion techniques, Comput. Biol. Med. 136 (2021) 104649, https://doi.org/ [31] J. Li, X. Liang, Y. Wei, T. Xu, J. Feng, S. Yan, Perceptual generative adversarial
10.1016/j.compbiomed.2021.104649. networks for small object detection, in: 2017 IEEE Conference on Computer Vision
[7] M.M. Rahaman, C. Li, X. Wu, Y. Yao, Z. Hu, T. Jiang, X. Li, S. Qi, A survey for and Pattern Recognition, CVPR), 2017, pp. 1951–1959, https://doi.org/10.1109/
cervical cytopathology image analysis using deep learning, IEEE Access 8 (2020) CVPR.2017.211.
61687–61710, https://doi.org/10.1109/ACCESS.2020.2983186. [32] N. Otsu, A threshold selection method from gray-level histograms, IEEE Trans.
[8] M.M. Rahaman, C. Li, Y. Yao, F. Kulwa, M.A. Rahman, Q. Wang, S. Qi, F. Kong, Syst. Man Cybernet. 9 (1) (1979) 62–66, https://doi.org/10.1109/
X. Zhu, X. Zhao, Identification of covid-19 samples from chest x-ray images using TSMC.1979.4310076.
deep learning: a comparison of transfer learning approaches, J. X Ray Sci. Technol. [33] X. Zhou, Y. Lu, Efficient mean shift particle filter for sperm cells tracking, in: 2009
28 (5) (2020) 821–839, https://doi.org/10.3233/XST-200715. International Conference on Computational Intelligence and Security, 2009,
[9] C. Li, J. Zhang, F. Kulwa, S. Qi, Z. Qi, A sars-cov-2 microscopic image dataset with pp. 335–339, https://doi.org/10.1109/CIS.2009.264.
ground truth images and visual ffatures, in: Pattern Recognition and Computer [34] E. Soubiès, P. Weiss, X. Descombes, A 3d segmentation algorithm for ellipsoidal
Vision, PRCV), 2020, pp. 244–255, https://doi.org/10.1007/978-3-030-60633-6_ shapes. application to nuclei extraction, in: ICPRAM-international Conference on
20. Pattern Recognition Applications and Methods, 2013, pp. 97–105. https://hal.arch
[10] J. Zhang, C. Li, M.M. Rahaman, Y. Yao, P. Ma, J. Zhang, X. Zhao, T. Jiang, ives-ouvertes.fr/hal-00733187.
M. Grzegorzek, A comprehensive review of image analysis methods for [35] M.R. Ravanfar, M.H. Moradi, Low contrast sperm detection and tracking by
microorganism counting: from classical image processing to deep learning watershed algorithm and particle filter, in: 2011 18th Iranian Conference of
approaches, Artif. Intell. Rev. (2021) 1–70, https://doi.org/10.1007/s10462-021- Biomedical Engineering, ICBME), 2011, pp. 260–263, https://doi.org/10.1109/
10082-4. ICBME.2011.6168568.
[11] W. Zhao, P. Ma, C. Li, X. Bu, S. Zou, T. Jang, M. Grzegorzek, A survey of semen [36] A. Nurhadiyatna, A.L. Latifah, D. Fryantoni, T. Wirahman, R. Wijayanti, F.
quality evaluation in microscopic videos using computer assisted sperm analysis, H. Muttaqien, Comparison and implementation of motion detection methods for
arXiv preprint arXiv:2202.07820, https://doi.org/10.48550/arXiv.2202.07820, sperm detection and tracking, in: 2014 International Symposium on Micro-
2022. NanoMechatronics and Human Science, MHS), 2014, pp. 1–5, https://doi.org/
[12] W. Zhao, S. Zou, C. Li, J. Li, J. Zhang, P. Ma, Y. Gu, P. Xu, X. Bu, A survey of sperm 10.1109/MHS.2014.7006125.
detection techniques in microscopic videos, in: The Fourth International [37] M. Berezansky, H. Greenspan, D. Cohen-Or, O. Eitan, Segmentation and tracking of
Symposium on Image Computing and Digital Medicine, 2020, pp. 219–224, human sperm cells using spatio-temporal representation and clustering, in: Medical
https://doi.org/10.1145/3451421.3451467. Imaging 2007: Image Processing, 2007, pp. 891–902, https://doi.org/10.1117/
[13] M. Elsayed, T.M. El-Sherry, M. Abdelgawad, Development of computer-assisted 12.708887.
sperm analysis plugin for analyzing sperm motion in microfluidic environments [38] L.Z. Shi, J. Nascimento, C. Chandsawangbhuwana, M.W. Berns, E.L. Botvinick,
using image-j, Theriogenology 84 (8) (2015) 1367–1377, https://doi.org/ Real-time automated tracking and trapping system for sperm, Microsc. Res. Tech.
10.1016/j.theriogenology.2015.07.021. 69 (11) (2006) 894–902, https://doi.org/10.1002/jemt.20359.
[14] L.F. Urbano, P. Masson, M. VerMilyea, M. Kam, Automatic tracking and motility [39] G.-G. Wang, M. Lu, Y.-Q. Dong, X.-J. Zhao, Self-adaptive extreme learning
analysis of human sperm in time-lapse images, IEEE Trans. Med. Imag. 36 (3) machine, Neural Comput. Appl. 27 (2) (2016) 291–303, https://doi.org/10.1007/
(2017) 792–801, https://doi.org/10.1109/TMI.2016.2630720. s00521-015-1874-3.
[15] X. Li, C. Li, F. Kulwa, M.M. Rahaman, W. Zhao, X. Wang, D. Xue, Y. Yao, Y. Cheng, [40] J.-H. Yi, J. Wang, G.-G. Wang, Improved probabilistic neural networks with self-
J. Li, S. Qi, T. Jiang, Foldover features for dynamic object behaviour description in adaptive strategies for transformer fault diagnosis problem, Adv. Mech. Eng. 8 (1)
microscopic videos, IEEE Access 8 (2020) 114519–114540, https://doi.org/ (2016), 1687814015624832, https://doi.org/10.1177/1687814015624832.
10.1109/ACCESS.2020.3003993. [41] S. Kosov, K. Shirahama, C. Li, M. Grzegorzek, Environmental microorganism
[16] H. Yang, X. Descombes, S. Prigent, G. Malandain, X. Druart, F. Plouraboué, Head classification using conditional random fields and deep convolutional neural
tracking and flagellum tracing for sperm motility analysis, in: 2014 IEEE 11th networks, Pattern Recogn. 77 (2018) 248–261, https://doi.org/10.1016/j.
International Symposium on Biomedical Imaging, ISBI), 2014, pp. 310–313, patcog.2017.12.021.
https://doi.org/10.1109/ISBI.2014.6867871. [42] C. Li, K. Shirahama, M. Grzegorzek, Application of content-based image analysis to
[17] Z. Zou, Z. Shi, Y. Guo, J. Ye, Object detection in 20 years: a survey, arXiv preprint environmental microorganism classification, Biocybern. Biomed. Eng. 35 (1)
arXiv:1905.05055, https://doi.org/10.48550/arXiv.1905.05055, 2019. (2015) 10–21, https://doi.org/10.1016/j.bbe.2014.07.003.
[18] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate [43] J. Zhang, C. Li, S. Kosov, M. Grzegorzek, K. Shirahama, T. Jiang, C. Sun, Z. Li, H. Li,
object detection and semantic segmentation, in: 2014 IEEE Conference on Lcu-net: a novel low-cost u-net for environmental microorganism image
Computer Vision and Pattern Recognition, CVPR), 2014, pp. 580–587, https://doi. segmentation, Pattern Recogn. 115 (2021), https://doi.org/10.1016/j.
org/10.1109/CVPR.2014.81. patcog.2021.107885, 107885.
[19] R. Girshick, Fast r-cnn, in: 2015 IEEE International Conference on Computer Vision [44] F. Kulwa, C. Li, X. Zhao, B. Cai, N. Xu, S. Qi, S. Chen, Y. Teng, A state-of-the-art
(ICCV), 2015, pp. 1440–1448, https://doi.org/10.1109/ICCV.2015.169. survey for microorganism image segmentation methods and future potential, IEEE
[20] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection Access 7 (2019) 100243–100269, https://doi.org/10.1109/
with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell. 39 (6) ACCESS.2019.2930111.
(2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031. [45] C. Sun, C. Li, J. Zhang, M.M. Rahaman, S. Ai, H. Chen, F. Kulwa, Y. Li, X. Li,
[21] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: 2017 IEEE International T. Jiang, Gastric histopathology image segmentation using a hierarchical
Conference on Computer Vision (ICCV), 2017, pp. 2980–2988, https://doi.org/ conditional random field, Biocybern. Biomed. Eng. 40 (4) (2020) 1535–1555,
10.1109/ICCV.2017.322. https://doi.org/10.1016/j.bbe.2020.09.008.
11
S. Zou et al. Computers in Biology and Medicine 146 (2022) 105543
[46] Z. Cui, F. Xue, X. Cai, Y. Cao, G.-g. Wang, J. Chen, Detection of malicious code [58] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection
variants based on deep learning, IEEE Trans. Ind. Inf. 14 (7) (2018) 3187–3196, with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell. 39 (6)
https://doi.org/10.1109/TII.2018.2822680. (2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031.
[47] M. Shen, C. Li, W. Huang, P. Szyszka, K. Shirahama, M. Grzegorzek, D. Merhof, [59] M. O’connell, N. Mcclure, S. Lewis, The effects of cryopreservation on sperm
O. Deussen, Interactive tracking of insect posture, Pattern Recogn. 48 (11) (2015) morphology, motility and mitochondrial function, Hum. Reprod. 17 (3) (2002)
3560–3571, https://doi.org/10.1016/j.patcog.2015.05.011. 704–709, https://doi.org/10.1093/humrep/17.3.704.
[48] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid [60] G.-G. Wang, S. Deb, Z. Cui, Monarch butterfly optimization, Neural Comput. Appl.
networks for object detection, in: 2017 IEEE Conference on Computer Vision and 31 (7) (2019) 1995–2014, https://doi.org/10.1007/s00521-015-1923-y.
Pattern Recognition, CVPR), 2017, pp. 936–944, https://doi.org/10.1109/ [61] G.-G. Wang, S. Deb, L.D.S. Coelho, Earthworm optimisation algorithm: a bio-
CVPR.2017.106. inspired metaheuristic algorithm for global optimisation problems, Int. J. Bio-
[49] K. He, X. Zhang, S. Ren, J. Sun, Spatial pyramid pooling in deep convolutional Inspired Comput. 12 (1) (2018) 1–22, https://doi.org/10.1504/
networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) IJBIC.2018.093328.
(2015) 1904, https://doi.org/10.1109/TPAMI.2015.2389824. –1916. [62] G.-G. Wang, S. Deb, L.d.S. Coelho, Elephant herding optimization, in: 2015 3rd
[50] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: International Symposium on Computational and Business Intelligence, ISCBI),
2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR), 2016, 2015, pp. 1–5, https://doi.org/10.1109/ISCBI.2015.8.
pp. 770–778, https://doi.org/10.1109/CVPR.2016.90. [63] J. Li, H. Lei, A.H. Alavi, G.-G. Wang, Elephant herding optimization: variants,
[51] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale hybrids, and applications, Mathematics 8 (9) (2020) 1415, https://doi.org/
image recognition, arXiv preprint arXiv:1409.1556, https://doi.org/10.48 10.3390/math8091415.
550/arXiv.1409.1556, 2015. [64] G.-G. Wang, Moth search algorithm: a bio-inspired metaheuristic algorithm for
[52] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception global optimization problems, Memetic Comput. 10 (2) (2018) 151–164, https://
architecture for computer vision, in: 2016 IEEE Conference on Computer Vision doi.org/10.1007/s12293-016-0212-3.
and Pattern Recognition, CVPR), 2016, pp. 2818–2826, https://doi.org/10.1109/ [65] S. Li, H. Chen, M. Wang, A.A. Heidari, S. Mirjalili, Slime mould algorithm: a new
CVPR.2016.308. method for stochastic optimization, Future Generat. Comput. Syst. 111 (2020)
[53] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, D. Ren, Distance-iou loss: faster and better 300–323, https://doi.org/10.1016/j.future.2020.03.055.
learning for bounding box regression, in: Proceedings of the AAAI Conference on [66] Y. Yang, H. Chen, A.A. Heidari, A.H. Gandomi, Hunger games search: visions,
Artificial Intelligence, 2020, pp. 12993–13000, https://doi.org/10.1609/aaai. conception, implementation, deep analysis, perspectives, and towards performance
v34i07.6999. shifts, Expert Syst. Appl. 177 (2021) 114864, https://doi.org/10.1016/j.
[54] A. Chen, C. Li, S. Zou, M.M. Rahaman, Y. Yao, H. Chen, H. Yang, P. Zhao, W. Hu, eswa.2021.114864.
W. Liu, G. Marcin, Svia dataset: a new dataset of microscopic videos and images for [67] W. Li, G.-G. Wang, A.H. Gandomi, A survey of learning-based intelligent
computer-aided sperm analysis, Biocybern. Biomed. Eng. 42 (1) (2022) 204–214, optimization algorithms, Arch. Comput. Methods Eng. 28 (5) (2021) 3781–3799,
https://doi.org/10.1016/j.bbe.2021.12.010. https://doi.org/10.1007/s11831-021-09562-1.
[55] Y. Hu, J. Lu, Y. Shao, Y. Huang, N. Lü, Comparison of the semen analysis results [68] J. Tu, H. Chen, M. Wang, A.H. Gandomi, The colony predation algorithm, JBE 18
obtained from two branded computer-aided sperm analysis systems, Andrologia 45 (3) (2021) 674–710, https://doi.org/10.1007/s42235-021-0050-y.
(5) (2013) 315–318, https://doi.org/10.1111/and.12010. [69] M. Li, G.-G. Wang, A review of green shop scheduling problem, Inf. Sci. 589 (2022)
[56] I. Loshchilov, F. Hutter, Sgdr: stochastic gradient descent with warm restarts, arXiv 478–496, https://doi.org/10.1016/j.ins.2021.12.122.
preprint arXiv:1608.03983, https://doi.org/10.48550/arXiv.1608.03983, 2017. [70] A.A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, H. Chen, Harris hawks
[57] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, M. Pietikäinen, Deep optimization: algorithm and applications, Future Generat. Comput. Syst. 97 (2019)
learning for generic object detection: a survey, Int. J. Comput. Vis. 128 (2) (2020) 849–872, https://doi.org/10.1016/j.future.2019.02.028.
261–318, https://doi.org/10.1007/s11263-019-01247-4.
12