0% found this document useful (0 votes)
49 views

Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes

This document presents a new method for plant detection and counting using an improved version of the Yolov8 deep learning model. The authors enhance Yolov8 for use with UAV images by integrating shallow-level information and optimizing upsampled feature precision. Evaluation on four plant datasets shows their approach outperforms other counting techniques. They also release a new cotton boll dataset and update labels for an existing wheat ear dataset.

Uploaded by

Divya M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Plant Detection and Counting Enhancing Precision Agriculture in UAV and General Scenes

This document presents a new method for plant detection and counting using an improved version of the Yolov8 deep learning model. The authors enhance Yolov8 for use with UAV images by integrating shallow-level information and optimizing upsampled feature precision. Evaluation on four plant datasets shows their approach outperforms other counting techniques. They also release a new cotton boll dataset and update labels for an existing wheat ear dataset.

Uploaded by

Divya M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Received 27 September 2023, accepted 16 October 2023, date of publication 18 October 2023, date of current version 25 October 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3325747

Plant Detection and Counting: Enhancing


Precision Agriculture in UAV and
General Scenes
DUNLU LU 1, JIANXIONG YE 1, YANGXU WANG1 , AND ZHENGHONG YU 1,2
1 College
of Robotics, Guangdong Polytechnic of Science and Technology, Guangzhou 519090, China
2 Mahanakorn Institute of Innovation, Mahanakorn University of Technology, Bangkok 10530, Thailand

Corresponding author: Zhenghong Yu ([email protected])


This work was supported in part by 2022 Key Scientific Research Project of Ordinary Universities in Guangdong Province under Grant
2022ZDZX4075, in part by 2022 Guangdong Province Ordinary Universities Characteristic Innovation Project under Grant
2022KTSCX251, in part by the Collaborative Intelligent Robot Production and Education Integrates Innovative Application Platform
Based on the Industrial Internet under Grant 2020CJPT004, in part by 2020 Guangdong Rural Science and Technology Mission Project
under Grant KTP20200153, in part by the Engineering Research Centre for Intelligent Equipment Manufacturing under Grant
2021GCZX018, and in part by the Guangdong Polytechnic of Science and Technology & DOBOT Collaborative Innovation Center under
Grant K01057060.

ABSTRACT Plant detection and counting play a crucial role in modern agriculture, providing vital
references for precision management and resource allocation. This study follows the footsteps of machine
learning experts by introducing the state-of-the-art Yolov8 technology into the field of plant science.
Moreover, we made some simple yet effective improvements. The integration of shallow-level informa-
tion into the Path Aggregation Network (PANet) served to counterbalance the resolution loss stemming
from the expanded receptive field. The enhancement of upsampled features was accomplished through
combining the lightweight up-sampling operator Content-Aware ReAssembly of Features (CARAFE)
with the Multi-Efficient Channel Attention (Mlt-ECA) technique to optimize the precision of upsampled
features. This collective approach markedly amplified the discernment of small objects in Unmanned
Aerial Vehicle (UAV) images, naming it Yolov8-UAV. Our evaluation is based on datasets containing four
different plant species. Experimental results demonstrate the strong competitiveness of our proposed method
even when compared to the most advanced counting techniques, and it possesses sufficient robustness.
In order to advance the cross-disciplinary research of computer vision and plant science, we also release
a new cotton boll dataset with detailed annotated bounding box information. What’s more, we address
previous oversights in existing wheat ear datasets by providing updated labels consistent with global research
advancements. Overall, this research offers practitioners a powerful solution for addressing real-world
application challenges. For UAV scenarios, recommend using the specialized Yolov8-UAV, while Yolov8-N
is a wise choice for general scenes due to its sufficient accuracy and speed in the majority of cases.
Furthermore, we contribute two meaningful datasets that have research significance, effectively promoting
the application of data resources in the field of plant science. In short, our contribution is to improve the use
of Yolov8 in UAV scenarios and open two datasets with bounding boxes. The curated data and code can be
accessed at the following link: https://github.com/Ye-Sk/Plant-dataset.

INDEX TERMS Cotton boll, detection and counting, UAV, wheat ear, Yolov8.

I. INTRODUCTION and demonstrated remarkable performance and extensive


In recent years, deep learning, as the core technology of the applications in various fields [1]. In scientific research and
third wave of artificial intelligence, has made rapid progress engineering practice, deep learning has achieved significant
achievements. Plant detection and counting, as important
tasks in plant science and agricultural production, have
The associate editor coordinating the review of this manuscript and also benefited from the advancements in deep learning
approving it for publication was Turgay Celik . technology.
2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
116196 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 11, 2023
D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

Accurate plant detection and counting play a critical role for research endeavors. On a different front, Liang et al. [20]
in plant research, precision agricultural management, and introduced a single-stage detector known as FS-SDD. They
resource allocation [2], [3]. However, traditional methods constructed a feature pyramid by combining deconvolu-
for plant detection and counting have limitations, including tion modules and feature fusion modules, fully harnessing
limited feature extraction capabilities and subjective manual these hierarchical features during the prediction process.
rule design [4], [5], [6]. These methods struggle to cope Their approach also takes spatial context information into
with the complexity and variability of plant scenes and the account. Wang et al. [21], on the other hand, proposed a
processing requirements of large-scale data. detector with contextual information to alleviate the chal-
The emergence of deep learning technology has provided lenge of complex backgrounds in remote sensing images.
new solutions for addressing plant detection and counting They also enhanced the region proposal network of RCNN.
problems. Deep learning is a machine learning approach Furthermore, Liu et al. [22] devised a Multi-branch Paral-
based on multi-layer neural networks, which efficiently lel Feature Pyramid Networks (MPFPN) to recover small
handles complex tasks by automatically learning feature object features lost in deep semantic information. Whereas,
representations and pattern recognition from large-scale these methods demand significant memory and compu-
data [7], [8]. In the context of plant detection and counting, tational resources, limiting their practical application on
deep learning techniques have brought new breakthroughs to low-power edge image processing devices. In the realm of
plant science and agricultural production with their robust agriculture, Lu et al. [41] proposed a local counting network
feature learning and pattern recognition capabilities. Through named TasselNetV3, which improved the visual output by
training and inference of deep learning models, accurate introducing an upsampling operator to supervise the redis-
detection and counting of plant objects in image data can tribution of counts. Bai et al. [42] designed a deep network
be achieved, greatly improving work efficiency and data pro- called RPNet, which enhances the counting performance for
cessing accuracy [9], [10]. rice plants by densely utilizing shallow and deep features.
Over the past few years, there have been many advanced Liu et al. [12] employed ResNet as the backbone for Faster
deep learning-based methods and models emerging in the R-CNN to detect tassels in high-resolution UAV images.
field of plant detection and counting, providing formidable While they effectively enhance the recognition performance
tools for agricultural producers to monitor and control var- for small-sized plant objects, these aforementioned methods
ious issues related to plant growth. Object detection, as an require high-performance computing devices for both train-
important research direction, has gained increasing attention ing and inference. At the current stage, high-resolution plant
in the plant domain. Researchers have started exploring the image datasets collected by UAV have gained widespread
use of deep learning models for plant detection and count- attention. These datasets contain diverse plant objects and
ing tasks, including well-known models such as Yolo [11], complex scenes, better simulating real-world application
Faster R-CNN [12], and EfficientDet [13]. Some researchers environments and driving the application of plant detection
have also made a series of improvements to accomplish and counting methods in agricultural production. In this
plant detection and counting tasks [14], [15]. Despite that, study, we selected Yolov8 as a powerful baseline model and
these improvements often involve complex and laborious enhanced its perception of small objects by introducing a
implementation processes and are optimized for specific simple yet effective upsampling process. Unlike previous
application scenarios. This situation limits the development research, we replaced the traditional nearest-neighbor upsam-
of cross-disciplinary research between computer vision and pling operation in Yolov8 with a data-dependent lightweight
plant science towards a more general direction. upsampling operator called Content-Aware ReAssembly of
Fortunately, thanks to the relentless efforts of machine FEatures (CARAFE) [23]. Nevertheless, after each CARAFE
learning pioneers, some excellent general-purpose machine operation, we applied a Multi-Efficient Channel Attention
learning models have been proposed [8], [16]. Among them, (Mlt-ECA) [24] for weighted adjustment of features. These
the Yolo model has garnered significant attention due to its improvement methods are straightforward to implement.
outstanding balance between accuracy and speed. As the We chose this approach because the Yolov8 baseline itself
latest detector in the Yolo series, Yolov8 not only inherits has demonstrated strong performance, and excessive complex
the advantages of previous models but also surpasses them, improvements may lead to other performance trade-offs. The
becoming a potent tool for practitioners in the field of plant improved model is named Yolov8-UAV, as it is more suitable
science. for UAV-like image detection tasks.
With the rapid development of Unmanned Aerial Vehicles In addition to model design and training, dataset con-
(UAV) and remote sensing technology [4], [17], in the vast struction and annotation are also crucial aspects. To our
realm of research, numerous scholars are dedicating their knowledge, there is currently no publicly available cotton
efforts to advancing the analysis of remote sensing images. boll dataset. Therefore, based on previous automated obser-
Several publicly available remote sensing datasets, such as vation work [4], we have released a cotton boll dataset
Remote Sensing Object Detection (RSOD) [18] and Uni- named Cotton Boll Detection Augmented (CBDA), which
versity of Chinese Academy of Sciences - Aerial Object includes annotated bounding boxes. We also noticed that
Detection (UCAS-AOD) [19], are providing robust support Madec et al. [26] contributed a wheat ear dataset called
VOLUME 11, 2023 116197
D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

Here, we provide a brief introduction to the characteristics


and challenges of these datasets.
The CBDA dataset is introduced for the first time in
this paper, was collected in a specific region of Xinjiang
Uygur Autonomous Region in 2013 using an automated
ground observation system. Detailed information about the
imaging device can be found in [4]. Due to the inherent
growth patterns of the cotton bolls and limitations in sample
collection, our dataset has relatively limited samples. In addi-
tion, the variation patterns of cotton bolls over time are not
very pronounced. Given these limitations, we selected only
75 representative images as the foundation of the dataset.
To compensate for the limited sample size, we employed
techniques such as color distortion and mosaic augmentation
to expand the cotton boll images. Through this approach,
we expanded the dataset to a total of 180 images. Signif-
icantly, it should be emphasized that due to the stochastic
FIGURE 1. Example images from four plant datasets. nature of the augmentation process, the difficulty of recogni-
tion may significantly increase for some images, potentially
exceeding the model’s understanding capabilities.
The WEDU dataset is an extension of the WED dataset
Wheat Ears Detection (WED) with annotation boxes. Yet and
originally released by Madec et al. [26]. These pioneers
still, their work overlooked the consistency between annota-
have made significant contributions in the field of plant
tion labels and images, which hindered other researchers from
research, but unfortunately, they overlooked the consistency
keeping pace with global research progress. Hence, we used
between the annotation labels and the images in the released
our previous wheat ear recognition model to regenerate anno-
dataset. This issue has hindered researchers from keeping
tation labels for the WED dataset, and named it Wheat Ears
pace with global research advancements. In our previous
Detection Update (WEDU).
work, we developed a neural network, WheatLFANet [25],
In summary, the main contributions of this paper are as
for detecting wheat ears detection, and based on this achieve-
follows:
ment, we re-generated the annotation boxes for the WED
1) Upsampling method: By introducing a simple yet effec- dataset. Nonetheless, it is inevitable that due to the limitations
tive upsampling process, it enhances the perception of model performance, we couldn’t completely eliminate
capability for detecting small-scale objects. Channel potential noise in the annotation boxes. This poses a sig-
suppression is performed after each upsampling step to nificant challenge compared to other meticulously curated
eliminate feature redundancy. datasets.
2) Yolov8: It provides a powerful baseline model for prac- The MTDC dataset is a collection of images related to
titioners to select and use deep learning methods in maize tassels, gathered from four experimental fields in
practical applications. For applications in similar UAV China and spanning six maize varieties. The dataset com-
scenarios, it is recommended to choose the specialized prises 186 images for training and 175 images for testing.
Yolov8-UAV. In general scenarios, selecting Yolov8-N Notably, the testing set was intentionally designed to con-
is advisable. sist of entirely different sequences, resulting in significant
3) CBDA and WEDU datasets: The cotton boll and wheat variations in data distribution. This characteristic poses a
ear datasets, including detailed annotation boxes, have considerable challenge for domain adaptation, demanding the
been publicly released, contributing to the advance- model to possess strong generalization capabilities for prac-
ment of research in related fields. tical applications and adapt to diverse scenes and conditions.
The images in the dataset have varying resolutions, including
II. DATASETS AND METHODS 3648×2736, 4272×2848, and 3456×2304, further adding to
A. PLANT DATASETS the complexity of the task. The MTDC dataset’s uniqueness
We conducted performance evaluations on four plant lies in its diverse and challenging composition, making it a
datasets, including the publicly available Maize Tassels valuable resource for research on maize tassels detection and
Detection and Counting (MTDC) [27] dataset and Rape counting algorithms.
Flower Rectangular Box Labeling (RFRB) [28] dataset. The RFRB dataset was collected between 2021 and 2022 in
What’s more, we introduced two new datasets in this paper, Wuhan, Hubei, China, specifically focusing on the study of
namely Cotton Boll Detection Augmented (CBDA) and rape flowers. This dataset comprises a total of 114 images
Wheat Ears Detection Update (WEDU). Example images of rape flowers, with 90 images allocated for training pur-
from the four plant datasets are shown in Figure 1. poses and 24 images designated for testing. An important

116198 VOLUME 11, 2023


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

FIGURE 2. Yolov8-UAV network framework, which uses PANet to fuse multi-scale image information.

characteristic of the RFRB dataset is that these images were improved. Our modification simply involves adding an addi-
captured using a mobile device at a height ranging from 10 to tional upsampling process to the FPN backbone to enhance
15 meters, making it a typical dataset for UAV images. One the perception of small objects and fusion with the C2 layer
notable aspect of the RFRB dataset is the presence of a of the feature set, resulting in an additional output feature
considerable number of instances in each image, ranging layer. This improvement is simple, effective, and easy to
from 27 to 629. This high object density presents a significant implement, as demonstrated in previous experiments and
challenge for the model to accurately detect and capture experiences [34], [35], [36].
small-scale plant features. In contrast to previous studies, we employ a data-dependent
lightweight upsampling operator called Content-Aware
ReAssembly of FEatures (CARAFE) [23] instead of the tradi-
B. PROPOSED METHOD
tional nearest-neighbor upsampling operation used in Yolov8.
Taking into account the deployment requirements on edge
In comparison to traditional bilinear interpolation upsam-
devices in the context of plant science, Yolov8 offers different
pling, the CARAFE method offers a significant advancement.
versions such as N, S, M, etc. Considering our specific needs,
CARAFE has the ability to dynamically generate upsampling
we have chosen the most lightweight version, Yolov8-N,
kernels, enabling instance-specific content-aware processing.
as the baseline model. Following modern neural network
This adaptability allows CARAFE to effectively integrate
design principles, we have made minor yet effective mod-
a broader range of contextual information while still main-
ifications that make the detection network structure more
taining a lightweight design. As a result, it surpasses the
comprehensive and detailed, specifically suited for detect-
limitations of bilinear interpolation upsampling when it
ing small and densely-packed plant objects in UAV images.
comes to processing semantic information and expanding
Hence, we have named it Yolov8-UAV. The overall network
the perceptual range of feature maps. CARAFE’s innovative
architecture is illustrated in Figure 2.
approach opens new possibilities for enhancing feature maps
In recent years, the Path Aggregation Network (PANet) [29]
and achieving more precise and contextually informed results
has emerged as a novel paradigm for object detection [30],
in various image processing tasks. After each CARAFE
[31], [32], standing out for its outstanding multi-scale fea- operation, we apply a Multi-Efficient Channel Attention
ture fusion and contextual information aggregation. PANet (Mlt-ECA) [24] for weighted feature adjustment. Mlt-ECA
incorporates a bottom-up path to extract high-resolution utilizes a dimensionality-preserving local cross-channel
features and combines it with a top-down path for contextual interaction strategy and adaptively determines the size of
information aggregation, showcasing its unique advantages. the 1D convolution kernel based on the needs, achieving
The introduction of PANet has played a positive role in the coverage of local cross-channel interactions. Specifically:
rapid development of the object detection field. As one of
the state-of-the-art detectors known today, Yolov8 also adopts log2 (C) β
this remarkable PANet structure. k = ψ(C) = + (1)
γ γ odd
Firstly, PANet leverages the backbone structure of the Fea-
ture Pyramid Network (FPN) [33] to construct a pyramid-like where k represents the size of the convolution kernel, C rep-
feature map, enabling efficient detection of objects of dif- resents the number of channels, and odd indicates that k is
ferent sizes through cross-scale feature fusion. Secondly, an odd number. γ and β are set to 2 and 1, respectively,
by adding bottom-up path augmentation, the network’s in our experiments, to adjust the proportion between C and
perception of details and low-level features is further the convolution kernel.

VOLUME 11, 2023 116199


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

The incorporation of multi-scale feature fusion, contextual boxes and the ground truth boxes, d̂i , di represents the val-
information aggregation, and channel attention has enhanced ues of the predicted distance field and the ground truth
the model’s perception, expressive power, and adaptability. distance field, CIoU (b̂i , bi ) represents the computed CIoU
By integrating multi-scale features, the model gains a more value, and wi represents the weight of the i-th positive or
comprehensive understanding of the input data, allowing negative sample. DF(d̂i , di ) represents the distance field loss
it to capture fine-grained details and high-level contextual function computed using DFL. To specify, DFL is a distance
information simultaneously. Contextual information aggrega- field-based loss function used to optimize the regression task
tion enhances the model’s global context awareness, leading in detection, and its expression is as follows:
to more accurate predictions, particularly in tasks involv-
K
ing object detection and segmentation. The introduction 1 XNpos X4 X
of channel attention further boosts the model’s expressive Ldf = [pj log(pj ) − w(k)qjk log(qjk )]
4Npos i=1 j=1
k=1
power by selectively emphasizing relevant and discrimi-
(4)
native features, leading to improved feature representation
and extraction. The collective impact of these enhance- In this equation, pj represents the j-th element of the ground
ments is particularly advantageous in detecting small and truth distance field, qjk represents the probability of the k-th
crowded objects, making the model highly suitable for real- component corresponding to the j-th element of the predicted
world scenarios that involve intricate and densely arranged distance field, and wk serves as a weight coefficient to balance
objects. the loss between different k values. Finally, the loss of Yolov8
Overall, the integration of multi-scale feature fusion, is defined as Los = αLcls + βLreg , where α and β are
contextual information aggregation, and channel attention hyperparameters.
demonstrates a holistic approach to enhancing the model’s
capabilities. The proposed modifications contribute to its III. EXPERIMENTS AND RESULTS
generality, making it a potent tool for tackling challenging A. TRAINING DETAILS AND QUANTITATIVE METRICS
visual tasks and paving the way for further advancements in The experiments were implemented using the PyTorch
computer vision research and applications. deep learning framework and accelerated using CUDA. The
CBDA training dataset was divided into 120 images for train-
C. LOSS FUNCTION ing and 60 images for testing. The WEDU dataset consisted
Yolov8’s loss calculation includes both classification loss of 165 training images and 71 testing images. The MTDC
and regression loss. The purpose of the classification loss dataset contained 186 training images and 175 testing images.
is to help the model distinguish between foreground and The RFRB dataset included 90 training images and 24 test-
background, while the regression loss is used to constrain ing images. The model was optimized for 300 epochs. It is
the model’s learning process for predicting box positions important to note that the model parameter configuration used
and shapes. In particular, the classification loss is formulated in this study remained consistent with the default parameters
as Binary Cross-Entropy Loss (BCE) [37], which can be and no adjustments were made.
expressed as follows: We used the following evaluation metrics to quantify
1 Xn the detection performance: precision (Pr ), recall (Re ), aver-
Lbce = − [yi log pi + (1 − yi ) log(1 − pi )] (2) age precision at 50% IoU (AP50 ), and average precision
n i=1
at 50%-95% IoU (AP50-95 ). These metrics provide more
It is a commonly used binary classification loss function,
accurate measures of the model’s localization performance.
used to measure the learning dissimilarity between positive
Precision represents the proportion of correctly predicted
and negative samples by the model. For Equation (2), the
objects among all predicted objects by the model, while
target value (label value) is denoted as y, the predicted result
recall represents the proportion of correctly predicted objects
as p, and n’represents the batch size.
among all actual objects. AP refers to the mean area under the
The regression loss is guided by the Complete Intersection
Pr - Re curve. They are calculated as follows:
over Union (CIoU) [38] and Distribution Focal Loss (DFL)
[39] functions. In greater detail, the CIoU loss measures the TP
matching degree between the predicted bounding box and the Pr = (5)
TP + FP
ground truth bounding box, while the DFL loss focuses on the TP
matching of the distance field. It can be described as follows: Re = (6)
TP + FN
Z 1
1 XNpos
Lreg = (wi × [1 − CIoU (b̂i , bi )] + DF(d̂i , di )) AP = Pr (Re )d(Re ) (7)
Npos i=1 0
(3)
where TP, FP, and FN represent the number of true pos-
Here, Npos represents the number of positive sample boxes, itives, false positives, and false negatives, respectively.
b̂i , bi represents the coordinate information of the predicted Besides, the evaluation metrics for counting tasks are as

116200 VOLUME 11, 2023


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

TABLE 1. Quantitative results of CBDA dataset.

TABLE 2. Quantitative results of WEDU dataset.

TABLE 3. Quantitative results of MTDC dataset.

follows: network Faster R-CNN [40] for object detection, the


1 XN anchor-free method CenterNet [46] for object detection, and
MAE = |Gn − Pn | (8) the state-of-the-art model TasselLFANet [14] for maize tassel
N n=1
r localization in the current agricultural domain, as shown in
1 XN
RMSE = (Gn − Pn)2 (9) Tables 1-4. Considering the small spatial occupancy of plant
N n=1
instances in the WEDU and RFRB datasets, these datasets can
N represents the number of images, Gn represents the be classified as typical UAV datasets. In this case, compared
predicted count in the nth image, and Pn represents the to Yolov8-N, Yolov8-UAV demonstrates stronger compet-
ground-truth count in the nth image. Mean Absolute Error itiveness and specialization in the detection and counting
(MAE) quantifies the accuracy of the model, while Root task. In general scenes, choosing Yolov8-N is wise, as it
Mean Square Error (RMSE) quantifies the robustness of the possesses sufficient accuracy and generality in the majority
model. The lower the values of these two metrics, the better of cases. TasselNetV3-Seg† , RPNet, and RapeNet repre-
the counting performance. sent advanced paradigms in the field of object counting.
While these methods provide reliable results in plant count-
B. RESULTS AND DISCUSSION ing, they face a crucial limitation: the inability to provide
For the feasibility of our proposed method, we directly com- accurate plant information. This is a drawback for appli-
pared it with state-of-the-art results. Additionally, we com- cations that aim for fine-grained agricultural management.
pared three representative methods: the classic two-stage Object detection, compared to object counting, is a more

VOLUME 11, 2023 116201


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

TABLE 4. Quantitative results of RFRB dataset.

from diverse datasets, highlights its robustness and adapt-


ability. The interpretability advantage of the linear regression
visualization proved invaluable in diagnosing underlying
issues that might not be apparent through other evaluation
metrics.
Notably, certain regression results displayed significant
deviations, providing valuable insights into the specific chal-
lenges posed by these diverse datasets. This observation
underlines the intricacies that persist in computer vision
problems, particularly in the context of complex plant sci-
ence environments. The visual representation of correct and
incorrect detections in Figure 4 further accentuated these
complexities, as both Yolov8-N and Yolov8-UAV models
exhibited some erroneous responses despite seemingly good
counting levels.
Understanding these challenges prompted us to con-
FIGURE 3. Yolov8-N and Yolov8-UAV linear regression results.
sider a delicate balance between various factors, such
as network width, depth, and resolution, as emphasized
promising paradigm. Yolov8, as the latest detector devel- in [43] and [44]. Achieving optimal performance necessitates
oped by machine learning experts, surpasses most dedicated thoughtful consideration of these dimensions. We acknowl-
counting methods in terms of counting performance. Even edge that using higher-resolution images can indeed yield
on our low-cost devices - Nvidia GTX 1650 GPU (4G) and substantial improvements in performance, but it inevitably
Intel i5-10200H CPU (8G) laptop, Yolov8 exhibits efficient incurs higher computational costs. Striking the right balance
task completion with an ultra-real-time efficiency of 161fps. between computational efficiency and performance becomes
This means that even on more affordable devices, Yolov8 can a critical consideration for real-world applications.
efficiently handle the task. It is also important to note that, We recognize that using higher-resolution images can
when evaluating detection tasks, the focus lies on assessing indeed lead to substantial performance improvements, but
the exceptional performance metrics of classification models, it comes at the expense of increased computational costs.
whereas in counting tasks, greater emphasis is placed on the This trade-off becomes a critical consideration for real-world
model’s accurate prediction capability for continuous vari- applications, where computational efficiency plays a signifi-
ables. When evaluating and improving detection and counting cant role in deploying models effectively.
tasks, the pursuit of outstanding performance metrics for clas- We also observed that the visual differences between
sification models and precise prediction performance metrics Yolov8-N and Yolov8-UAV are relatively minor. In fact,
for continuous variables becomes crucial to ensure compre- Yolov8-UAV’s advantage stems from slightly more accurate
hensive optimization of the model across diverse tasks and detections per image and its adaptability to specific UAV sce-
achieve the highest level of performance. narios. This also implies that Yolov8-UAV is generally more
robust. In conclusion, the linear regression visualization and
C. LINEAR REGRESSION VISUALIZATION the analysis of correct and incorrect detections have provided
As shown in Figure 3, the visual examination of counting a comprehensive assessment of our method’s performance.
errors through the linear regression graph was an essential It has also shed light on the challenges and trade-offs involved
step in our analysis. The impressive fitting ability demon- in tackling complex computer vision tasks, particularly in the
strated by our proposed method, even in the face of challenges context of plant science. These findings contribute to a deeper

116202 VOLUME 11, 2023


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

FIGURE 4. Visualization results of the four plant datasets.

understanding of the model’s behavior and will guide future V. CONCLUSION


a dvancements in the field of computer vision, especially In this study, we extensively applied the advanced baseline
in precision agriculture and environmental monitoring appli- Yolov8 proposed by machine learning experts to a wide range
cations. As we continue to refine our approach, we remain of plant data. We further enhanced the model’s perception
committed to addressing the complexities of real-world sce- of small objects through simple yet effective improvement
narios and enhancing the practical utility of computer vision methods that are easy to implement. Our experimental results
techniques for various scientific domains. unequivocally demonstrate the strong competitiveness of our
proposed approach, even when compared to state-of-the-
IV. GOOD PRACTICE SUGGESTIONS art counting methods. The renowned accuracy and speed
1) Deploying Yolov8-N and Yolov8-UAV on resource- balance of the Yolo series make it highly user-friendly for
constrained devices is a smart choice as they have the practitioners.
optimal performance and generality. In the general scenes, opting for Yolov8-N proves to be a
2) Due to their ability to provide a comprehensive scene wise decision. Alternatively, using Yolov8-UAV at the cost of
description, Yolov8-N and Yolov8-UAV exhibit strong some speed loss can significantly improve performance in the
interpretability. This enables a deep understanding of UAV scenario, and it has sufficient generality and robustness.
the model’s decision-making process, facilitating opti- Moreover, to contribute to the research community,
mization and diagnosis of specific components. we have released a new CBDA dataset focusing on cotton
3) Capturing imaging views from lower angles is prefer- bolls sand an updated WEDU dataset focusing on wheat
able since it avoids introducing significant scale varia- ears. These datasets aim to attract researchers’ attention and
tions that could complicate recognition. foster collaborative efforts in advancing the field of plant
4) Optimal image acquisition conditions entail suitable science through machine learning techniques. It’s crucial to
lighting, minimal background interference, and accu- point out that the CBDA dataset’s richness is relatively lim-
rate color representation. ited, and ample training data remains crucial for achieving

VOLUME 11, 2023 116203


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

good performance. At times, achieving this requires col- [19] H. Zhu, X. Chen, W. Dai, K. Fu, Q. Ye, and J. Jiao, ‘‘Orientation
laboration among researchers worldwide. Alternatively, the robust object detection in aerial images using deep convolutional neural
network,’’ in Proc. IEEE Int. Conf. Image Process. (ICIP), Sep. 2015,
presence of noise in the WEDU dataset is detrimental to pp. 3735–3739.
models with poor robustness against adversarial interference. [20] X. Liang, J. Zhang, L. Zhuo, Y. Li, and Q. Tian, ‘‘Small object detection
Moving forward, we’ll apply advanced techniques in plant in unmanned aerial vehicle images using feature fusion and scaling-based
single shot detector with spatial context analysis,’’ IEEE Trans. Circuits
science following expert guidance. Our focus is on interdis- Syst. Video Technol., vol. 30, no. 6, pp. 1758–1770, Jun. 2020.
ciplinary research, innovation, and impactful contributions [21] Y. Wang, C. Xu, C. Liu, and Z. Li, ‘‘Context information refinement
to agriculture and sustainability. We’ll connect cutting-edge for few-shot object detection in remote sensing images,’’ Remote Sens.,
vol. 14, no. 14, p. 3255, Jul. 2022.
machine learning with practical plant science, empowering
[22] Y. Liu, F. Yang, and P. Hu, ‘‘Small-object detection in UAV-captured
researchers for a food-secure future. images via multi-branch parallel feature pyramid networks,’’ IEEE Access,
vol. 8, pp. 145740–145750, 2020.
REFERENCES [23] J. Wang, K. Chen, R. Xu, Z. Liu, C. C. Loy, and D. Lin, ‘‘CARAFE:
Content-aware ReAssembly of FEatures,’’ in Proc. IEEE/CVF Int. Conf.
[1] Q. Zhou, D. Zhao, B. Shuai, Y. Li, H. Williams, and H. Xu, ‘‘Knowledge
Comput. Vis. (ICCV), Oct. 2019, pp. 3007–3016.
implementation and transfer with an adaptive learning network for real-
time power management of the plug-in hybrid vehicle,’’ IEEE Trans. [24] Z. Yu, J. Ye, C. Li, H. Zhou, and X. Li, ‘‘TasselLFANet: A novel
Neural Netw. Learn. Syst., vol. 32, no. 12, pp. 5298–5308, Dec. 2021. lightweight multi-branch feature aggregation neural network for high-
throughput image-based maize tassels detection and counting,’’ Frontiers
[2] L. Wang, L. Xiang, L. Tang, and H. Jiang, ‘‘A convolutional neural
Plant Sci., vol. 14, Apr. 2023, Art. no. 1158940.
network-based method for corn stand counting in the field,’’ Sensors,
vol. 21, no. 2, p. 507, Jan. 2021. [25] J. Ye, Z. Yu, Y. Wang, D. Lu, and H. Zhou, ‘‘WheatLFANet: In-field detec-
[3] Y. Wang, Z. Cao, X. Bai, Z. Yu, and Y. Li, ‘‘An automatic detection method tion and counting of wheat heads with high-real-time global regression
to the field wheat based on image processing,’’ Proc. SPIE, vol. 8918, network,’’ Plant Methods, vol. 19, no. 1, p. 103, Oct. 2023.
Oct. 2015, Art. no. 89180F. [26] S. Madec, X. Jin, H. Lu, B. De Solan, S. Liu, F. Duyme, E. Heritier,
[4] Z. Yu, Z. Cao, X. Wu, X. Bai, Y. Qin, W. Zhuo, Y. Xiao, X. Zhang, and and F. Baret, ‘‘Ear density estimation from high resolution RGB imagery
H. Xue, ‘‘Automatic image-based detection technology for two critical using deep learning technique,’’ Agricult. Forest Meteorol., vol. 264,
growth stages of maize: Emergence and three-leaf stage,’’ Agricult. Forest pp. 225–234, Jan. 2019.
Meteorol., vols. 174–175, pp. 65–84, Jun. 2013. [27] H. Zou, H. Lu, Y. Li, L. Liu, and Z. Cao, ‘‘Maize tassels detection: A
[5] Z. Yu, H. Zhou, and C. Li, ‘‘An image-based automatic recognition method benchmark of the state of the art,’’ Plant Methods, vol. 16, no. 1, p. 108,
for the flowering stage of maize,’’ Proc. SPIE, vol. 10611, Mar. 2018, Dec. 2020.
Art. no. 104200I. [28] J. Li, E. Wang, J. Qiao, Y. Li, L. Li, J. Yao, and G. Liao, ‘‘Automatic rape
[6] C.-N. Li, X.-F. Zhang, Z.-H. Yu, and X.-F. Wang, ‘‘Accuracy evaluation flower cluster counting method based on low-cost labelling and UAV-RGB
of summer maize coverage and leaf area index inversion based on images images,’’ Plant Methods, vol. 19, no. 1, p. 40, Apr. 2023.
extraction technology,’’ Chin. J. Agrometeorol., vol. 37, no. 4, pp. 479–491, [29] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, ‘‘Path aggregation network for
2016. instance segmentation,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
[7] H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and Recognit., Jun. 2018, pp. 8759–8768.
L. Zhang, ‘‘CvT: Introducing convolutions to vision transformers,’’ 2021, [30] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
arXiv:2103.15808. speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
[8] Y. Ma, Y. Cao, Y. Hong, and A. Sun, ‘‘Large language model is not a good [31] C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, ‘‘YOLOv7: Trainable
few-shot information extractor, but a good reranker for hard samples!’’ bag-of-freebies sets new state-of-the-art for real-time object detectors,’’
2023, arXiv:2303.08559. 2022, arXiv:2207.02696.
[9] N. Panigrahi and B. S. Das, ‘‘Evaluation of regression algorithms for esti- [32] C. Y. Wang, I. H. Yeh, and H. Y. M. Liao, ‘‘You only learn one represen-
mating leaf area index and canopy water content from water stressed Rice tation: Unified network for multiple tasks,’’ J. Inf. Sci. Eng., vol. 39, no. 2,
canopy reflectance,’’ Inf. Process. Agricult., vol. 8, no. 2, pp. 284–298, pp. 691–709, 2021.
Jun. 2021.
[33] T. Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie,
[10] T. B. Shahi, C.-Y. Xu, A. Neupane, and W. Guo, ‘‘Recent advances in crop
‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf.
disease detection using UAV and deep learning techniques,’’ Remote Sens.,
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 2117–2125.
vol. 15, no. 9, p. 2450, May 2023.
[34] J. Yan, J. Zhao, Y. Cai, S. Wang, X. Qiu, X. Yao, Y. Tian, Y. Zhu, W. Cao,
[11] S. Xiang, S. Wang, M. Xu, W. Wang, and W. Liu, ‘‘YOLO POD: A fast
and X. Zhang, ‘‘Improving multi-scale detection layers in the deep learning
and accurate multi-task model for dense soybean pod counting,’’ Plant
network for wheat spike detection based on interpretive analysis,’’ Plant
Methods, vol. 19, no. 1, p. 8, Jan. 2023.
Methods, vol. 19, no. 1, p. 46, May 2023.
[12] Y. Liu, C. Cen, Y. Che, R. Ke, Y. Ma, and Y. Ma, ‘‘Detection of maize
tassels from UAV RGB imagery with faster R-CNN,’’ Remote Sens., [35] J. Chen, H. Liu, Y. Zhang, D. Zhang, H. Ouyang, and X. Chen, ‘‘A multi-
vol. 12, no. 2, p. 338, Jan. 2020. scale lightweight and efficient model based on YOLOv7: Applied to citrus
orchard,’’ Plants, vol. 11, no. 23, p. 3260, Nov. 2022.
[13] Y. Wang, Y. Qin, and J. Cui, ‘‘Occlusion robust wheat ear counting
algorithm based on deep learning,’’ Frontiers Plant Sci., vol. 12, Jun. 2021, [36] W. Liu, K. Quijano, and M. M. Crawford, ‘‘YOLOv5-tassel: Detecting
Art. no. 645899. tassels in RGB UAV imagery with improved YOLOv5 based on transfer
[14] S. Yang, J. Liu, K. Xu, X. Sang, J. Ning, and Z. Zhang, ‘‘Improved learning,’’ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 15,
CenterNet based maize tassel recognition for UAV remote sensing image,’’ pp. 8085–8094, 2022.
Trans. Chin. Soc. Agricult. Machinery, vol. 52, pp. 206–212, Jan. 2021. [37] M. Yeung, E. Sala, C.-B. Schönlieb, and L. Rundo, ‘‘Unified focal loss:
[15] C. Miao, A. Guo, A. M. Thompson, J. Yang, Y. Ge, and J. C. Schnable, Generalising dice and cross entropy-based losses to handle class imbal-
‘‘Automation of leaf counting in maize and sorghum using deep learning,’’ anced medical image segmentation,’’ Computerized Med. Imag. Graph.,
Plant Phenome J., vol. 4, no. 1, Jan. 2021, Art. no. e20022. vol. 95, Jan. 2022, Art. no. 102026.
[16] W. Wang, E. Xie, X. Li, D.-P. Fan, K. Song, D. Liang, T. Lu, P. Luo, and [38] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, ‘‘Distance-IoU loss:
L. Shao, ‘‘PVT v2: Improved baselines with pyramid vision transformer,’’ Faster and better learning for bounding box regression,’’ in Proc. AAAI
Comput. Vis. Media, vol. 8, pp. 415–424, Sep. 2022. Conf. Artif. Intell., vol. 34, no. 7, 2020, pp. 12993–13000.
[17] Z. Yu, H. Zhou, and C. Li, ‘‘Fast non-rigid image feature matching for agri- [39] X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, ‘‘Gen-
cultural UAV via probabilistic inference with regularization techniques,’’ eralized focal loss: Learning qualified and distributed bounding boxes for
Comput. Electron. Agricult., vol. 143, pp. 79–89, Dec. 2017. dense object detection,’’ in Proc. NeurIPS, 2020.
[18] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, ‘‘Accurate object localization in [40] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-
remote sensing images based on convolutional neural networks,’’ IEEE time object detection with region proposal networks,’’ IEEE Trans. Pattern
Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2486–2498, May 2017. Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017.

116204 VOLUME 11, 2023


D. Lu et al.: Plant Detection and Counting: Enhancing Precision Agriculture

[41] H. Lu, L. Liu, Y.-N. Li, X.-M. Zhao, X.-Q. Wang, and Z.-G. Cao, ‘‘Tas- JIANXIONG YE is currently pursuing the degree
selNetV3: Explainable plant counting with guided upsampling and back- with the College of Robotics, Guangdong Poly-
ground suppression,’’ IEEE Trans. Geosci. Remote Sens., vol. 60, 2022, technic of Science and Technology, Zhuhai, China.
Art. no. 4700515. He is also preparing to pursue the engineering
[42] X. Bai, S. Gu, P. Liu, A. Yang, Z. Cai, J. Wang, and J. Yao, ‘‘RPNet: Rice degree with Wuyi University. His research inter-
plant counting after tillering stage based on plant attention and multiple ests include computer vision, intelligent robotics,
supervision network,’’ Crop J., vol. 11, no. 5, pp. 1586–1594, Oct. 2023. and agricultural automation, with a specific focus
[43] M. Tan and Q. Le, ‘‘EfficientNet: Rethinking model scaling for con-
on object detection and object counting problems.
volutional neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2019,
Notably, his latest research project attained the
pp. 6105–6114.
First Prize in the prestigious Chinese Robotics and
[44] P. Dollàr, H. Touvron, M. Sandler, A. Howard, and S. Zagoruyko, ‘‘Fast and
accurate model scaling,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Artificial Intelligence Competition (CRAIC).
Recognit. (CVPR), Jun. 2021, pp. 924–932.
[45] J. Glenn. (2023). YOLOv8. [Online]. Available: https://github.
com/ultralytics/ultralytics YANGXU WANG is currently pursuing the degree
[46] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, ‘‘CenterNet: with the College of Robotics, Guangdong Poly-
Keypoint triplets for object detection,’’ in Proc. IEEE/CVF Int. Conf. technic of Science and Technology, Zhuhai, China.
Comput. Vis. (ICCV), Oct. 2019, pp. 6568–6577. He is also preparing to pursue the degree in com-
puter management with the Software Engineering
Institute of Guangzhou (SEIG). His research inter-
ests include intelligent robotics and agricultural
automation. He has a strong passion for the field of
intelligent robotics and aims to leverage the power
of robots and automation to optimize traditional
agriculture.

ZHENGHONG YU received the B.S. and M.S.


degrees in computer science from the Wuhan Insti-
tute of Technology, Wuhan, China, in 2005 and
DUNLU LU received the B.S. degree in electronic 2008, respectively, and the Ph.D. degree in con-
engineering from the Hefei University of Tech- trol science and engineering from the Huazhong
nology, in 1996, and the M.S. degree in commu- University of Science and Technology, Wuhan,
nication and information systems from the South in 2014. He is currently an Associate Professor
China University of Technology, in 1999. He is with the College of Robotics, Guangdong Poly-
currently an Associate Professor with the College technic of Science and Technology, Zhuhai, China.
of Robotics, Guangdong Polytechnic of Science Meanwhile, he has been invited as a Guest Pro-
and Technology, a member of the Chinese Institute fessor with the Hubei Provincial Laboratory of Intelligent Robot and a
of Electronics, and a leading professional in higher Distinguished Research Fellow with Fujian Agriculture and Forestry Uni-
vocational education in Guangdong Province. His versity. His research interests include computer vision, intelligent robots, and
research interests include image recognition, intelligent robotics, and mobile agriculture automation.
communications.

VOLUME 11, 2023 116205

You might also like