Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes
Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes
ABSTRACT Plant detection and counting play a crucial role in modern agriculture, providing vital
references for precision management and resource allocation. This study follows the footsteps of machine
learning experts by introducing the state-of-the-art Yolov8 technology into the field of plant science.
Moreover, we made some simple yet effective improvements. The integration of shallow-level informa-
tion into the Path Aggregation Network (PANet) served to counterbalance the resolution loss stemming
from the expanded receptive field. The enhancement of upsampled features was accomplished through
combining the lightweight up-sampling operator Content-Aware ReAssembly of Features (CARAFE)
with the Multi-Efficient Channel Attention (Mlt-ECA) technique to optimize the precision of upsampled
features. This collective approach markedly amplified the discernment of small objects in Unmanned
Aerial Vehicle (UAV) images, naming it Yolov8-UAV. Our evaluation is based on datasets containing four
different plant species. Experimental results demonstrate the strong competitiveness of our proposed method
even when compared to the most advanced counting techniques, and it possesses sufficient robustness.
In order to advance the cross-disciplinary research of computer vision and plant science, we also release
a new cotton boll dataset with detailed annotated bounding box information. What’s more, we address
previous oversights in existing wheat ear datasets by providing updated labels consistent with global research
advancements. Overall, this research offers practitioners a powerful solution for addressing real-world
application challenges. For UAV scenarios, recommend using the specialized Yolov8-UAV, while Yolov8-N
is a wise choice for general scenes due to its sufficient accuracy and speed in the majority of cases.
Furthermore, we contribute two meaningful datasets that have research significance, effectively promoting
the application of data resources in the field of plant science. In short, our contribution is to improve the use
of Yolov8 in UAV scenarios and open two datasets with bounding boxes. The curated data and code can be
accessed at the following link: https://github.com/Ye-Sk/Plant-dataset.
INDEX TERMS Cotton boll, detection and counting, UAV, wheat ear, Yolov8.
Accurate plant detection and counting play a critical role for research endeavors. On a different front, Liang et al. [20]
in plant research, precision agricultural management, and introduced a single-stage detector known as FS-SDD. They
resource allocation [2], [3]. However, traditional methods constructed a feature pyramid by combining deconvolu-
for plant detection and counting have limitations, including tion modules and feature fusion modules, fully harnessing
limited feature extraction capabilities and subjective manual these hierarchical features during the prediction process.
rule design [4], [5], [6]. These methods struggle to cope Their approach also takes spatial context information into
with the complexity and variability of plant scenes and the account. Wang et al. [21], on the other hand, proposed a
processing requirements of large-scale data. detector with contextual information to alleviate the chal-
The emergence of deep learning technology has provided lenge of complex backgrounds in remote sensing images.
new solutions for addressing plant detection and counting They also enhanced the region proposal network of RCNN.
problems. Deep learning is a machine learning approach Furthermore, Liu et al. [22] devised a Multi-branch Paral-
based on multi-layer neural networks, which efficiently lel Feature Pyramid Networks (MPFPN) to recover small
handles complex tasks by automatically learning feature object features lost in deep semantic information. Whereas,
representations and pattern recognition from large-scale these methods demand significant memory and compu-
data [7], [8]. In the context of plant detection and counting, tational resources, limiting their practical application on
deep learning techniques have brought new breakthroughs to low-power edge image processing devices. In the realm of
plant science and agricultural production with their robust agriculture, Lu et al. [41] proposed a local counting network
feature learning and pattern recognition capabilities. Through named TasselNetV3, which improved the visual output by
training and inference of deep learning models, accurate introducing an upsampling operator to supervise the redis-
detection and counting of plant objects in image data can tribution of counts. Bai et al. [42] designed a deep network
be achieved, greatly improving work efficiency and data pro- called RPNet, which enhances the counting performance for
cessing accuracy [9], [10]. rice plants by densely utilizing shallow and deep features.
Over the past few years, there have been many advanced Liu et al. [12] employed ResNet as the backbone for Faster
deep learning-based methods and models emerging in the R-CNN to detect tassels in high-resolution UAV images.
field of plant detection and counting, providing formidable While they effectively enhance the recognition performance
tools for agricultural producers to monitor and control var- for small-sized plant objects, these aforementioned methods
ious issues related to plant growth. Object detection, as an require high-performance computing devices for both train-
important research direction, has gained increasing attention ing and inference. At the current stage, high-resolution plant
in the plant domain. Researchers have started exploring the image datasets collected by UAV have gained widespread
use of deep learning models for plant detection and count- attention. These datasets contain diverse plant objects and
ing tasks, including well-known models such as Yolo [11], complex scenes, better simulating real-world application
Faster R-CNN [12], and EfficientDet [13]. Some researchers environments and driving the application of plant detection
have also made a series of improvements to accomplish and counting methods in agricultural production. In this
plant detection and counting tasks [14], [15]. Despite that, study, we selected Yolov8 as a powerful baseline model and
these improvements often involve complex and laborious enhanced its perception of small objects by introducing a
implementation processes and are optimized for specific simple yet effective upsampling process. Unlike previous
application scenarios. This situation limits the development research, we replaced the traditional nearest-neighbor upsam-
of cross-disciplinary research between computer vision and pling operation in Yolov8 with a data-dependent lightweight
plant science towards a more general direction. upsampling operator called Content-Aware ReAssembly of
Fortunately, thanks to the relentless efforts of machine FEatures (CARAFE) [23]. Nevertheless, after each CARAFE
learning pioneers, some excellent general-purpose machine operation, we applied a Multi-Efficient Channel Attention
learning models have been proposed [8], [16]. Among them, (Mlt-ECA) [24] for weighted adjustment of features. These
the Yolo model has garnered significant attention due to its improvement methods are straightforward to implement.
outstanding balance between accuracy and speed. As the We chose this approach because the Yolov8 baseline itself
latest detector in the Yolo series, Yolov8 not only inherits has demonstrated strong performance, and excessive complex
the advantages of previous models but also surpasses them, improvements may lead to other performance trade-offs. The
becoming a potent tool for practitioners in the field of plant improved model is named Yolov8-UAV, as it is more suitable
science. for UAV-like image detection tasks.
With the rapid development of Unmanned Aerial Vehicles In addition to model design and training, dataset con-
(UAV) and remote sensing technology [4], [17], in the vast struction and annotation are also crucial aspects. To our
realm of research, numerous scholars are dedicating their knowledge, there is currently no publicly available cotton
efforts to advancing the analysis of remote sensing images. boll dataset. Therefore, based on previous automated obser-
Several publicly available remote sensing datasets, such as vation work [4], we have released a cotton boll dataset
Remote Sensing Object Detection (RSOD) [18] and Uni- named Cotton Boll Detection Augmented (CBDA), which
versity of Chinese Academy of Sciences - Aerial Object includes annotated bounding boxes. We also noticed that
Detection (UCAS-AOD) [19], are providing robust support Madec et al. [26] contributed a wheat ear dataset called
VOLUME 11, 2023 116197
D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture
FIGURE 2. Yolov8-UAV network framework, which uses PANet to fuse multi-scale image information.
characteristic of the RFRB dataset is that these images were improved. Our modification simply involves adding an addi-
captured using a mobile device at a height ranging from 10 to tional upsampling process to the FPN backbone to enhance
15 meters, making it a typical dataset for UAV images. One the perception of small objects and fusion with the C2 layer
notable aspect of the RFRB dataset is the presence of a of the feature set, resulting in an additional output feature
considerable number of instances in each image, ranging layer. This improvement is simple, effective, and easy to
from 27 to 629. This high object density presents a significant implement, as demonstrated in previous experiments and
challenge for the model to accurately detect and capture experiences [34], [35], [36].
small-scale plant features. In contrast to previous studies, we employ a data-dependent
lightweight upsampling operator called Content-Aware
ReAssembly of FEatures (CARAFE) [23] instead of the tradi-
B. PROPOSED METHOD
tional nearest-neighbor upsampling operation used in Yolov8.
Taking into account the deployment requirements on edge
In comparison to traditional bilinear interpolation upsam-
devices in the context of plant science, Yolov8 offers different
pling, the CARAFE method offers a significant advancement.
versions such as N, S, M, etc. Considering our specific needs,
CARAFE has the ability to dynamically generate upsampling
we have chosen the most lightweight version, Yolov8-N,
kernels, enabling instance-specific content-aware processing.
as the baseline model. Following modern neural network
This adaptability allows CARAFE to effectively integrate
design principles, we have made minor yet effective mod-
a broader range of contextual information while still main-
ifications that make the detection network structure more
taining a lightweight design. As a result, it surpasses the
comprehensive and detailed, specifically suited for detect-
limitations of bilinear interpolation upsampling when it
ing small and densely-packed plant objects in UAV images.
comes to processing semantic information and expanding
Hence, we have named it Yolov8-UAV. The overall network
the perceptual range of feature maps. CARAFE’s innovative
architecture is illustrated in Figure 2.
approach opens new possibilities for enhancing feature maps
In recent years, the Path Aggregation Network (PANet) [29]
and achieving more precise and contextually informed results
has emerged as a novel paradigm for object detection [30],
in various image processing tasks. After each CARAFE
[31], [32], standing out for its outstanding multi-scale fea- operation, we apply a Multi-Efficient Channel Attention
ture fusion and contextual information aggregation. PANet (Mlt-ECA) [24] for weighted feature adjustment. Mlt-ECA
incorporates a bottom-up path to extract high-resolution utilizes a dimensionality-preserving local cross-channel
features and combines it with a top-down path for contextual interaction strategy and adaptively determines the size of
information aggregation, showcasing its unique advantages. the 1D convolution kernel based on the needs, achieving
The introduction of PANet has played a positive role in the coverage of local cross-channel interactions. Specifically:
rapid development of the object detection field. As one of
the state-of-the-art detectors known today, Yolov8 also adopts log2 (C) β
this remarkable PANet structure. k = ψ(C) = + (1)
γ γ odd
Firstly, PANet leverages the backbone structure of the Fea-
ture Pyramid Network (FPN) [33] to construct a pyramid-like where k represents the size of the convolution kernel, C rep-
feature map, enabling efficient detection of objects of dif- resents the number of channels, and odd indicates that k is
ferent sizes through cross-scale feature fusion. Secondly, an odd number. γ and β are set to 2 and 1, respectively,
by adding bottom-up path augmentation, the network’s in our experiments, to adjust the proportion between C and
perception of details and low-level features is further the convolution kernel.
The incorporation of multi-scale feature fusion, contextual boxes and the ground truth boxes, d̂i , di represents the val-
information aggregation, and channel attention has enhanced ues of the predicted distance field and the ground truth
the model’s perception, expressive power, and adaptability. distance field, CIoU (b̂i , bi ) represents the computed CIoU
By integrating multi-scale features, the model gains a more value, and wi represents the weight of the i-th positive or
comprehensive understanding of the input data, allowing negative sample. DF(d̂i , di ) represents the distance field loss
it to capture fine-grained details and high-level contextual function computed using DFL. To specify, DFL is a distance
information simultaneously. Contextual information aggrega- field-based loss function used to optimize the regression task
tion enhances the model’s global context awareness, leading in detection, and its expression is as follows:
to more accurate predictions, particularly in tasks involv-
K
ing object detection and segmentation. The introduction 1 XNpos X4 X
of channel attention further boosts the model’s expressive Ldf = [pj log(pj ) − w(k)qjk log(qjk )]
4Npos i=1 j=1
k=1
power by selectively emphasizing relevant and discrimi-
(4)
native features, leading to improved feature representation
and extraction. The collective impact of these enhance- In this equation, pj represents the j-th element of the ground
ments is particularly advantageous in detecting small and truth distance field, qjk represents the probability of the k-th
crowded objects, making the model highly suitable for real- component corresponding to the j-th element of the predicted
world scenarios that involve intricate and densely arranged distance field, and wk serves as a weight coefficient to balance
objects. the loss between different k values. Finally, the loss of Yolov8
Overall, the integration of multi-scale feature fusion, is defined as Los = αLcls + βLreg , where α and β are
contextual information aggregation, and channel attention hyperparameters.
demonstrates a holistic approach to enhancing the model’s
capabilities. The proposed modifications contribute to its III. EXPERIMENTS AND RESULTS
generality, making it a potent tool for tackling challenging A. TRAINING DETAILS AND QUANTITATIVE METRICS
visual tasks and paving the way for further advancements in The experiments were implemented using the PyTorch
computer vision research and applications. deep learning framework and accelerated using CUDA. The
CBDA training dataset was divided into 120 images for train-
C. LOSS FUNCTION ing and 60 images for testing. The WEDU dataset consisted
Yolov8’s loss calculation includes both classification loss of 165 training images and 71 testing images. The MTDC
and regression loss. The purpose of the classification loss dataset contained 186 training images and 175 testing images.
is to help the model distinguish between foreground and The RFRB dataset included 90 training images and 24 test-
background, while the regression loss is used to constrain ing images. The model was optimized for 300 epochs. It is
the model’s learning process for predicting box positions important to note that the model parameter configuration used
and shapes. In particular, the classification loss is formulated in this study remained consistent with the default parameters
as Binary Cross-Entropy Loss (BCE) [37], which can be and no adjustments were made.
expressed as follows: We used the following evaluation metrics to quantify
1 Xn the detection performance: precision (Pr ), recall (Re ), aver-
Lbce = − [yi log pi + (1 − yi ) log(1 − pi )] (2) age precision at 50% IoU (AP50 ), and average precision
n i=1
at 50%-95% IoU (AP50-95 ). These metrics provide more
It is a commonly used binary classification loss function,
accurate measures of the model’s localization performance.
used to measure the learning dissimilarity between positive
Precision represents the proportion of correctly predicted
and negative samples by the model. For Equation (2), the
objects among all predicted objects by the model, while
target value (label value) is denoted as y, the predicted result
recall represents the proportion of correctly predicted objects
as p, and n’represents the batch size.
among all actual objects. AP refers to the mean area under the
The regression loss is guided by the Complete Intersection
Pr - Re curve. They are calculated as follows:
over Union (CIoU) [38] and Distribution Focal Loss (DFL)
[39] functions. In greater detail, the CIoU loss measures the TP
matching degree between the predicted bounding box and the Pr = (5)
TP + FP
ground truth bounding box, while the DFL loss focuses on the TP
matching of the distance field. It can be described as follows: Re = (6)
TP + FN
Z 1
1 XNpos
Lreg = (wi × [1 − CIoU (b̂i , bi )] + DF(d̂i , di )) AP = Pr (Re )d(Re ) (7)
Npos i=1 0
(3)
where TP, FP, and FN represent the number of true pos-
Here, Npos represents the number of positive sample boxes, itives, false positives, and false negatives, respectively.
b̂i , bi represents the coordinate information of the predicted Besides, the evaluation metrics for counting tasks are as
good performance. At times, achieving this requires col- [19] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, ‘‘Orientation
laboration among researchers worldwide. Alternatively, the robust object detection in aerial images using deep convolutional neural
network,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015,
presence of noise in the WEDU dataset is detrimental to pp. 3735–3739.
models with poor robustness against adversarial interference. [20] X. Liang, J. Zhang, L. Zhuo, Y. Li, and Q. Tian, ‘‘Small object detection
Moving forward, we’ll apply advanced techniques in plant in unmanned aerial vehicle images using feature fusion and scaling-based
single shot detector with spatial context analysis,’’ IEEE Trans. Circuits
science following expert guidance. Our focus is on interdis- Syst. Video Technol., vol. 30, no. 6, pp. 1758–1770, Jun. 2020.
ciplinary research, innovation, and impactful contributions [21] Y. Wang, C. Xu, C. Liu, and Z. Li, ‘‘Context information refinement
to agriculture and sustainability. We’ll connect cutting-edge for few-shot object detection in remote sensing images,’’ Remote Sens.,
vol. 14, no. 14, p. 3255, Jul. 2022.
machine learning with practical plant science, empowering
[22] Y. Liu, F. Yang, and P. Hu, ‘‘Small-object detection in UAV-captured
researchers for a food-secure future. images via multi-branch parallel feature pyramid networks,’’ IEEE Access,
vol. 8, pp. 145740–145750, 2020.
REFERENCES [23] J. Wang, K. Chen, R. Xu, Z. Liu, C. C. Loy, and D. Lin, ‘‘CARAFE:
Content-aware ReAssembly of FEatures,’’ in Proc. IEEE/CVF Int. Conf.
[1] Q. Zhou, D. Zhao, B. Shuai, Y. Li, H. Williams, and H. Xu, ‘‘Knowledge
Comput. Vis. (ICCV), Oct. 2019, pp. 3007–3016.
implementation and transfer with an adaptive learning network for real-
time power management of the plug-in hybrid vehicle,’’ IEEE Trans. [24] Z. Yu, J. Ye, C. Li, H. Zhou, and X. Li, ‘‘TasselLFANet: A novel
Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5298–5308, Dec. 2021. lightweight multi-branch feature aggregation neural network for high-
throughput image-based maize tassels detection and counting,’’ Frontiers
[2] L. Wang, L. Xiang, L. Tang, and H. Jiang, ‘‘A convolutional neural
Plant Sci., vol. 14, Apr. 2023, Art. no. 1158940.
network-based method for corn stand counting in the field,’’ Sensors,
vol. 21, no. 2, p. 507, Jan. 2021. [25] J. Ye, Z. Yu, Y. Wang, D. Lu, and H. Zhou, ‘‘WheatLFANet: In-field detec-
[3] Y. Wang, Z. Cao, X. Bai, Z. Yu, and Y. Li, ‘‘An automatic detection method tion and counting of wheat heads with high-real-time global regression
to the field wheat based on image processing,’’ Proc. SPIE, vol. 8918, network,’’ Plant Methods, vol. 19, no. 1, p. 103, Oct. 2023.
Oct. 2015, Art. no. 89180F. [26] S. Madec, X. Jin, H. Lu, B. De Solan, S. Liu, F. Duyme, E. Heritier,
[4] Z. Yu, Z. Cao, X. Wu, X. Bai, Y. Qin, W. Zhuo, Y. Xiao, X. Zhang, and and F. Baret, ‘‘Ear density estimation from high resolution RGB imagery
H. Xue, ‘‘Automatic image-based detection technology for two critical using deep learning technique,’’ Agricult. Forest Meteorol., vol. 264,
growth stages of maize: Emergence and three-leaf stage,’’ Agricult. Forest pp. 225–234, Jan. 2019.
Meteorol., vols. 174–175, pp. 65–84, Jun. 2013. [27] H. Zou, H. Lu, Y. Li, L. Liu, and Z. Cao, ‘‘Maize tassels detection: A
[5] Z. Yu, H. Zhou, and C. Li, ‘‘An image-based automatic recognition method benchmark of the state of the art,’’ Plant Methods, vol. 16, no. 1, p. 108,
for the flowering stage of maize,’’ Proc. SPIE, vol. 10611, Mar. 2018, Dec. 2020.
Art. no. 104200I. [28] J. Li, E. Wang, J. Qiao, Y. Li, L. Li, J. Yao, and G. Liao, ‘‘Automatic rape
[6] C.-N. Li, X.-F. Zhang, Z.-H. Yu, and X.-F. Wang, ‘‘Accuracy evaluation flower cluster counting method based on low-cost labelling and UAV-RGB
of summer maize coverage and leaf area index inversion based on images images,’’ Plant Methods, vol. 19, no. 1, p. 40, Apr. 2023.
extraction technology,’’ Chin. J. Agrometeorol., vol. 37, no. 4, pp. 479–491, [29] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, ‘‘Path aggregation network for
2016. instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[7] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and Recognit., Jun. 2018, pp. 8759–8768.
L. Zhang, ‘‘CvT: Introducing convolutions to vision transformers,’’ 2021, [30] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
arXiv:2103.15808. speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
[8] Y. Ma, Y. Cao, Y. Hong, and A. Sun, ‘‘Large language model is not a good [31] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, ‘‘YOLOv7: Trainable
few-shot information extractor, but a good reranker for hard samples!’’ bag-of-freebies sets new state-of-the-art for real-time object detectors,’’
2023, arXiv:2303.08559. 2022, arXiv:2207.02696.
[9] N. Panigrahi and B. S. Das, ‘‘Evaluation of regression algorithms for esti- [32] C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, ‘‘You only learn one represen-
mating leaf area index and canopy water content from water stressed Rice tation: Unified network for multiple tasks,’’ J. Inf. Sci. Eng., vol. 39, no. 2,
canopy reflectance,’’ Inf. Process. Agricult., vol. 8, no. 2, pp. 284–298, pp. 691–709, 2021.
Jun. 2021.
[33] T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
[10] T. B. Shahi, C.-Y. Xu, A. Neupane, and W. Guo, ‘‘Recent advances in crop
‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf.
disease detection using UAV and deep learning techniques,’’ Remote Sens.,
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.
vol. 15, no. 9, p. 2450, May 2023.
[34] J. Yan, J. Zhao, Y. Cai, S. Wang, X. Qiu, X. Yao, Y. Tian, Y. Zhu, W. Cao,
[11] S. Xiang, S. Wang, M. Xu, W. Wang, and W. Liu, ‘‘YOLO POD: A fast
and X. Zhang, ‘‘Improving multi-scale detection layers in the deep learning
and accurate multi-task model for dense soybean pod counting,’’ Plant
network for wheat spike detection based on interpretive analysis,’’ Plant
Methods, vol. 19, no. 1, p. 8, Jan. 2023.
Methods, vol. 19, no. 1, p. 46, May 2023.
[12] Y. Liu, C. Cen, Y. Che, R. Ke, Y. Ma, and Y. Ma, ‘‘Detection of maize
tassels from UAV RGB imagery with faster R-CNN,’’ Remote Sens., [35] J. Chen, H. Liu, Y. Zhang, D. Zhang, H. Ouyang, and X. Chen, ‘‘A multi-
vol. 12, no. 2, p. 338, Jan. 2020. scale lightweight and efficient model based on YOLOv7: Applied to citrus
orchard,’’ Plants, vol. 11, no. 23, p. 3260, Nov. 2022.
[13] Y. Wang, Y. Qin, and J. Cui, ‘‘Occlusion robust wheat ear counting
algorithm based on deep learning,’’ Frontiers Plant Sci., vol. 12, Jun. 2021, [36] W. Liu, K. Quijano, and M. M. Crawford, ‘‘YOLOv5-tassel: Detecting
Art. no. 645899. tassels in RGB UAV imagery with improved YOLOv5 based on transfer
[14] S. Yang, J. Liu, K. Xu, X. Sang, J. Ning, and Z. Zhang, ‘‘Improved learning,’’ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15,
CenterNet based maize tassel recognition for UAV remote sensing image,’’ pp. 8085–8094, 2022.
Trans. Chin. Soc. Agricult. Machinery, vol. 52, pp. 206–212, Jan. 2021. [37] M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, ‘‘Unified focal loss:
[15] C. Miao, A. Guo, A. M. Thompson, J. Yang, Y. Ge, and J. C. Schnable, Generalising dice and cross entropy-based losses to handle class imbal-
‘‘Automation of leaf counting in maize and sorghum using deep learning,’’ anced medical image segmentation,’’ Computerized Med. Imag. Graph.,
Plant Phenome J., vol. 4, no. 1, Jan. 2021, Art. no. e20022. vol. 95, Jan. 2022, Art. no. 102026.
[16] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and [38] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ‘‘Distance-IoU loss:
L. Shao, ‘‘PVT v2: Improved baselines with pyramid vision transformer,’’ Faster and better learning for bounding box regression,’’ in Proc. AAAI
Comput. Vis. Media, vol. 8, pp. 415–424, Sep. 2022. Conf. Artif. Intell., vol. 34, no. 7, 2020, pp. 12993–13000.
[17] Z. Yu, H. Zhou, and C. Li, ‘‘Fast non-rigid image feature matching for agri- [39] X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, ‘‘Gen-
cultural UAV via probabilistic inference with regularization techniques,’’ eralized focal loss: Learning qualified and distributed bounding boxes for
Comput. Electron. Agricult., vol. 143, pp. 79–89, Dec. 2017. dense object detection,’’ in Proc. NeurIPS, 2020.
[18] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, ‘‘Accurate object localization in [40] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-
remote sensing images based on convolutional neural networks,’’ IEEE time object detection with region proposal networks,’’ IEEE Trans. Pattern
Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2486–2498, May 2017. Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.
[41] H. Lu, L. Liu, Y.-N. Li, X.-M. Zhao, X.-Q. Wang, and Z.-G. Cao, ‘‘Tas- JIANXIONG YE is currently pursuing the degree
selNetV3: Explainable plant counting with guided upsampling and back- with the College of Robotics, Guangdong Poly-
ground suppression,’’ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, technic of Science and Technology, Zhuhai, China.
Art. no. 4700515. He is also preparing to pursue the engineering
[42] X. Bai, S. Gu, P. Liu, A. Yang, Z. Cai, J. Wang, and J. Yao, ‘‘RPNet: Rice degree with Wuyi University. His research inter-
plant counting after tillering stage based on plant attention and multiple ests include computer vision, intelligent robotics,
supervision network,’’ Crop J., vol. 11, no. 5, pp. 1586–1594, Oct. 2023. and agricultural automation, with a specific focus
[43] M. Tan and Q. Le, ‘‘EfficientNet: Rethinking model scaling for con-
on object detection and object counting problems.
volutional neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2019,
Notably, his latest research project attained the
pp. 6105–6114.
First Prize in the prestigious Chinese Robotics and
[44] P. Dollàr, H. Touvron, M. Sandler, A. Howard, and S. Zagoruyko, ‘‘Fast and
accurate model scaling,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Artificial Intelligence Competition (CRAIC).
Recognit. (CVPR), Jun. 2021, pp. 924–932.
[45] J. Glenn. (2023). YOLOv8. [Online]. Available: https://github.
com/ultralytics/ultralytics YANGXU WANG is currently pursuing the degree
[46] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, ‘‘CenterNet: with the College of Robotics, Guangdong Poly-
Keypoint triplets for object detection,’’ in Proc. IEEE/CVF Int. Conf. technic of Science and Technology, Zhuhai, China.
Comput. Vis. (ICCV), Oct. 2019, pp. 6568–6577. He is also preparing to pursue the degree in com-
puter management with the Software Engineering
Institute of Guangzhou (SEIG). His research inter-
ests include intelligent robotics and agricultural
automation. He has a strong passion for the field of
intelligent robotics and aims to leverage the power
of robots and automation to optimize traditional
agriculture.