Algorithm_for_Crop_Disease_Detection_Based_on_Channel_Attention_Mechanism_and_Lightweight_Up-Sampling_Operator
Algorithm_for_Crop_Disease_Detection_Based_on_Channel_Attention_Mechanism_and_Lightweight_Up-Sampling_Operator
ABSTRACT Crop diseases and pests cause significant economic losses to agriculture every year, making
accurate identification crucial. Traditional pest and disease detection relies on farm experts, which is often
time-consuming. Computer vision technology and artificial intelligence can provide automated disease
detection, enabling real-time precise control of crop diseases and timely prevention measures. To accurately
identify plant diseases under complex natural conditions, we developed an improved crop pest and disease
recognition model based on the original YOLOv5 network. First, we integrated the Squeeze-and-Excitation
(SE) module into YOLOv5, allowing our proposed model to better distinguish leaf features of different
crops and accurately identify disease types. Second, to enhance the model’s feature extraction capability for
diseased areas and reduce the loss of disease feature information, we replaced the original Up-sample module
in YOLOv5 with a lightweight up-sampling operator, the CARAFE module. Third, we improved the original
loss function using the EIoU loss function to increase the model’s detection accuracy. Lastly, to reduce model
complexity and meet real-time detection requirements, we introduced the Ghost Convolution module into
the backbone network. During the experimental phase, to validate the model’s effectiveness, we randomly
divided sample images from the constructed crop pest and disease database into training, validation, and test
sets. Experimental results showed that the improved YOLOv5 model achieved an accuracy of 90.0%, a recall
rate of 91.4%, [email protected] of 92.1%, and [email protected]:.95 of 64%. The parameter count and computational
load were reduced by 23.9% and 31.2%, respectively, outperforming popular methods including YOLOv5,
YOLOv7, and YOLOv8. The improved model can accurately identify crop pests and diseases under natural
conditions and is suitable for deployment in real-world applications, providing a technical reference for crop
pest and disease management.
INDEX TERMS Crop diseases and insect pests, YOLOV5, CARAFE, SE, EIoU loss function, ghost
convolution.
Therefore, developing an effective crop pest and disease primarily focused on image recognition and classification
recognition system to assist farmers in swiftly identifying tasks.
diseases and implementing timely preventive measures is Due to the advent of the industrial era, the global
critically important. environment has been greatly impacted, making crops
In recent years, with the rise of computer vision and increasingly susceptible to pest infestations. Some diseases
artificial intelligence, computer vision technology has been have extremely similar characteristics, and some leaves
extensively applied in complex scenarios such as industry may exhibit multiple disease features, making it diffi-
and agriculture [2]. Miranda et al. [8] utilized various image cult for classification models to differentiate them. Deep
processing techniques for pest detection and extraction, learning-based detection networks can locate and detect
developing an automated detection and extraction system disease areas on crops. Current image detection networks
to estimate pest density in rice fields. Barbedo et al. can be divided into two-stage (e.g., Faster-RCNN) and
[9] proposed a method based on color transformation, one-stage (e.g., YOLO series) networks [24]. Compared
color histograms, and paired-based classification systems to two-stage detection networks, one-stage networks are
to identify plant diseases. However, its accuracy fluctuates faster and more suitable for real-time monitoring of crop
significantly, ranging between 40% to 80%. Zhang et al. [10] pests and diseases. Meanwhile, the YOLO series algorithms
introduced a cucumber leaf disease identification method have good generalization ability and robustness for object
based on K-means clustering and sparse representation detection tasks in complex situations, making them highly
classification. Shrivastava and Pradhan [11] presented a valuable for detecting crop pests and diseases. For example,
classification approach for rice diseases, primarily using Ma et al. [25] proposed a lightweight detector, Light-
color feature extraction and support vector machines for YOLOv4, through a series of improvements to the YOLOv4
disease category classification. Kianat et al. [12] developed model. This detector maintains high detection accuracy
a cucumber disease classification system based on feature while significantly reducing the number of parameters,
fusion and selection techniques. Sugiarti et al. [13] aimed to computational load, and model size. In experiments by
enhance apple disease classification accuracy by combining Son et al. [26], a new attention-enhanced YOLO model was
naive Bayes classification with gray-level co-occurrence proposed to identify and detect plant leaf diseases. Li et al.
matrix extraction functions. Mukhopadhyay et al. [14] used [27] improved the CSP, Feature Pyramid Network (FPN), and
a non-dominated sorting genetic algorithm to detect disease Non-Maximum Suppression (NMS) modules in YOLOv5 to
areas on tea leaves, achieving an average accuracy of detect five types of vegetable diseases, achieving a 93.1%
83%.However, machine learning-based methods for identify- mAP and effectively reducing missed detections and false
ing crop pests and diseases often require manual extraction positives caused by complex backgrounds. Liu et al. [28]
of leaf disease features, which may not capture the essential introduced a new end-to-end pest detection algorithm based
characteristics of disease areas accurately. Consequently, this on YOLOv5s, replacing part of the C3 module with the C2f
can significantly impact the diagnosis of crop pests and module to obtain richer gradient information. Zhang et al.
diseases. [29] improved YOLOv5s to enhance the model’s detection
With the rapid development of deep learning technology, accuracy; the improved dragon fruit detection model achieved
an increasing number of researchers are applying deep an AP value of 97.4%, while also reducing the model’s
learning methods to agricultural research. This approach complexity, making it easier to deploy on embedded devices.
allows for the automatic learning of deep features from Xiao et al. [30] proposed using the ShuffleNet module to
images, offering detection speeds and accuracies far superior lighten the YOLOv5 algorithm and enhance network feature
to traditional algorithms [15]. Chen et al. [19] introduced fusion capability using the CBAM attention module. This
LeafNet, a CNN model capable of automatically extracting approach was used for detecting blueberries and recognizing
tea plant disease features from images. Jiang et al. [20] their ripeness, with the improved model achieving an average
used CNNs to extract features of rice leaf diseases and detection accuracy of 91.5% and significantly reducing the
employed SVMs for disease classification and prediction. model size, computational load, and the number of param-
Abbas et al. [21] proposed a deep learning-based method eters. Yang et al. [31] proposed a new crop pest detection
for tomato disease detection, employing adversarial networks model, YOLOv5s-pest. Firstly, the authors designed a new
to generate synthetic images of tomato leaves. Xiang et al. convolutional attention module (NCBAM). Secondly, they
[22] developed a lightweight convolutional neural net- introduced recursive gated convolution (g_n Conv) into
work model achieving 97.9% accuracy on the PlantVillage the neck network. Finally, they replaced non-maximum
dataset. Tan et al. [23] compared the effectiveness of deep suppression (NMS) with Soft-NMS. Experimental results
learning networks with machine learning algorithms for showed that [email protected]:0.95 reached 72.6%.
tomato leaf disease recognition, finding that deep learn- Although the aforementioned methods have achieved
ing networks outperformed in all metrics, with ResNet34 significant improvements in crop detection, most of them are
achieving the best results. While these methods automate designed to detect a single type of crop and are not suitable for
the extraction of crop disease features from images, they are simultaneous detection of multiple crop diseases and pests.
are normalised to the interval from 0 to 1 by the Sigmoid up-sampled feature maps, and enhance the model’s feature
activation function. Finally, the Scalling part is used to extraction of disease sites, so as to enable it to correctly
multiply the output of the second part with the original input distinguish the disease types among crops.The general
to return an output of the original size. framework of CARAFE up-sampling operator is shown in
The expression in the Squeeze section is shown below: FIGURE 7.
1 XH XW The CARAFE up-sampling operator consists of two
Zc = Fsq (uc ) = uc (i, j) (1) main modules, the feature recombination module and the
H×W i=1 j=1
up-sampling kernel prediction module. Assuming that the
where W and H are the height and width of the input dimension of the input feature map is given as C × W × H
feature map, respectively, uc (i, j) indicates a single element and the up-sampling rate σ , first the up-sampling prediction
value on a channel dimension, Zc is the compressed feature module analyses and encodes the input feature map, and
map. then the feature recombination module uses the predicted
The expression in the Excitation section is shown below: up-sampling kernel for up-sampling to obtain the feature map
s = Fex (z, W ) = σ (g(z, W )) = σ (W2 δ(W1 z)) (2) of dimension C × σ W × σ H.
c c The up-sampling kernel prediction module consists of
where W1 ∈ R r ×C , W 2 ∈ R r ×C , δ is the relu activation three sub-modules: channel compressor, content encoder and
function and σ is the sigmoid activation function. kernel normaliser. Among them, the channel compressor
The final weighted formula is shown below: is a 1 × 1 convolution, which is mainly used to reduce
X̃c = Fscale (uc , sc ) = sc × uc (3) the number of channels in the input feature map, thus
reducing the computational effort. The content encoder is an
up-sampling kernel of size kup × kup , whose size determines
D. CARAFE the size of the receptive field, i.e., a larger up-sampling kernel
In networks with feature fusion, the up-sampling operation indicates a larger receptive field. The kernel normaliser i.e.
in YOLOv5 uses nearest neighbour interpolation, which applies SoftMax to the up-sampled kernel for normalisation.
mainly focuses on the spatial information of the input feature The content-aware restructuring module maps each position
maps but ignores their semantic information, resulting in of the output feature map back to the input feature map,
lower quality of the up-sampled feature maps. To solve this and then performs a dot-product operation of a kup ×
problem, this paper proposes to introduce the content-aware kup region centred on that position with the predicted
feature restructuring (CARAFE) up-sampling operator to up-sampling kernel to obtain the output feature map. The
replace the original up-sampling method in YOLOv5. content-aware restructuring module enables more attention
CARAFE up-sampling operator can obtain more accurate to be paid to the relevant feature information in the local
109890 VOLUME 12, 2024
W. Chen et al.: Algorithm for Crop Disease Detection
region, and the restructured feature map has richer semantic the EIoU loss function to calculate the loss of the bounding
information. box. This loss function can provide a stable gradient, and the
EIoU algorithm can decompose the aspect ratio between the
E. EIOU LOSS FUNICATION predicted and actual bounding box, enabling the independent
The loss function of the target detection model mainly calculation of height and width. This allows the model to
calculates three parts, which are the bounding box loss, converge faster and at the same time effectively improves
the classification loss and the object confidence loss.The the performance of the model.The formula for the EIoU loss
YOLOv5 model mainly uses the CIoU loss to calculate the function is shown below:
loss of the bounding box. The CIoU loss takes into account
ρ 2 b, bgt ρ 2 w, wgt
the overlapping area, the centroid distance, and the height-to- LossEIOU = 1 − IOU + +
width ratio of the bounding box regression though. But the d2 d2w
ρ h, h
2 gt
v in its formula reflects the difference in height and width,
rather than the true difference between height and width + (8)
d2h
respectively and its confidence. The formula for the CIoU loss
function is shown in (4).b and bgt are the centroids of the true where dw and dh are the width and height of the minimum
frame and the predicted frame, respectively; ρ denotes the outer bounding rectangle of the predicted bounding box to
Euclidean distance between the 2 centroids; α is the weight the real bounding box.
function; w is the width of the predicted frame; h is the height
of the predicted frame; wgt and hgt are the width and height
F. GHOST CONVOLUTION MODULES
of the target frame, respectively.
To reduce the model size, computational load, and number
ρ 2 b, bgt
of parameters, making the model more suitable for industrial
LossCIoU = 1 − IoU + + αν (4) applications, this study introduces the Ghost convolution
d2
b ∩ bgt module. FIGURE 8 shows a comparison between the
IoU = (5) results of regular convolution and Ghost convolution. The
|b ∩ bgt |
ν Ghost convolution module, derived from GhostNet [33],
α= (6) is divided into three steps: first, using regular convolution
(1 − IoU) + ν
to reduce the number of channels in the input feature map;
4 wgt w
ν = 2 (arctan gt − arctan )2 (7) second, applying linear operations to perform layer-by-layer
π h h convolution on the feature map obtained in the previous step
Since the CIoU loss function does not take into account the to generate redundant feature maps; and finally, stacking and
difference in the aspect ratio of the bounding box during the connecting the feature maps obtained in the first two steps to
regression process, in this study, we propose to introduce get the output result.
G. IMPROVED MODEL
The improved model framework based on YOLOv5 is
shown in FIGURE 10. This model is suitable for detecting
various common crop diseases and pests. First, SE attention
mechanism modules are added to the 3rd, 6th, 9th, and
12th layers of the backbone network. These modules assign
different weights to different positions of the image in
the channel domain through a weight matrix, thereby
extracting more important feature information. Considering
the similarity between crop leaves, we introduced the SE
attention mechanism modules to enhance the model’s ability
FIGURE 8. Diagram comparing the structure of Conv and Ghost conv. to distinguish between different crop leaves. This not only
improves the model’s ability to extract leaf features but
also increases the model’s accuracy.Secondly, the high
Assuming the input feature map size is c × w × h after similarity of certain disease areas in different crops can
the Ghost convolution operation, the output feature map size lead to incorrect model outputs. Therefore, we propose
is n × w′ × h′ . Since Ghost convolution is composed of introducing the Content-Aware Reassembly of Features
an identity mapping part and a linear operation part, its (CARAFE) up-sampling operator to replace the original
computational load FGhost is given by the following formula: up-sampling method in YOLOv5. This method can obtain
n n more accurate up-sampled feature maps, thereby enhancing
FGhost = · h · w · c · k · k + (s − 1) · · h′ · w′ · d · d the model’s feature extraction of disease areas and correctly
s s
distinguishing between different types of crop diseases.Next,
(9)
we replace the original YOLOv5 loss function with the
In the formula, w and h represent the width and height of Efficient Intersection over Union (EIoU) loss function to
the input feature map, respectively, c denotes the number further improve the model’s performance. The EIoU loss
of channels in the input feature map, s represents the number function better considers the aspect ratio differences of the
of redundant feature maps, k × k is the kernel size of the bounding boxes, optimizing the bounding box regression pro-
identity mapping part, h′ and w′ represent the height and cess and enhancing the model’s accuracy and stability.Finally,
width of the output feature map, respectively, n is the number to reduce the model’s complexity and make it more
of channels in the output feature map, and d × d is the kernel lightweight, we introduce the Ghost convolution module into
size of the linear operation par. the backbone network. All convolution operations in the
If the input feature map of the same size is processed by backbone network, except for the first regular convolution,
regular convolution, the computational load FConv is given by are replaced with Ghost convolutions. Additionally, the C3
the following formula: module is replaced with the C3Ghost module. This reduces
the model’s complexity and achieves a lightweight model
FConv = n · h · w · c · k · k (10)
From the above formulas, the acceleration ratio rs between III. RESULTS AND DISCUSSION
regular convolution and Ghost convolution can be derived as A. TRAINING
follows: The experimental setup of the improved crop pest and
disease detection model based on the YOLOv5s model in
c·k ·k s·c
rs = ≈ ≈s (11) this study is shown in TABLE 2, and some of the main
1 s−1 s+c−1
s ·c·k ·k + s ·d ·d parameters for the training of this detection model are shown
From the above formula, it can be seen that the computational in TABLE 3.Before starting the training, the epoch was set
load required for Ghost convolution to extract features is to 300, the batch size was set to 16, the input image size was
approximately 1/s of that of regular convolution. Therefore, 640 × 640, SGD was selected as the optimiser for the neural
the Ghost convolution module can significantly reduce the network and the initial learning rate was set to 0.01
model size and computational load.
Therefore, to reduce model complexity and better achieve B. EVALUATION INDICATORS
crop disease detection on embedded devices, this paper In this study, we use widely used model evaluation metrics
replaces all regular convolutions in the backbone network, to evaluate and analyse the trained completed models, which
the SE attention mechanism module results in the most 18.4 percentage points when the original YOLOv5 backbone
significant performance improvements. Although the recall network is improved using ShuffleNetv2.In contrast, using
(R) decreased by 0.2%, the precision (P), [email protected], and the Ghost Convolution module for innovative improvements
[email protected]:.95 increased by 0.8%, 0.2%, and 0.8%, respec- in the backbone network results in a model size and parameter
tively. Therefore, adding the SE attention mechanism module count that are slightly higher than the other two improvement
in the backbone network can better enhance the detection methods. However, the model’s performance is superior to
performance of crop diseases the original YOLOv5 model, with a reduced model size
and parameter count. Therefore, to reduce model complexity
D. COMPARATIVE EXPERIMENT ON LIGHTWEIGHT while maintaining detection accuracy, this study adopts the
BACKBONE NETWORKS Ghost Convolution module for lightweight processing of the
This study introduces the Ghost Convolution module into backbone network.
the backbone network to reduce the model’s complex-
ity and achieve lightweight processing. To verify the E. ABLATION EXPERIMENTS
effectiveness of this method, we conducted compara- To validate the effectiveness of the improved model in
tive experiments and analyses on the original YOLOv5 this study, we conducted ablation experiments. The specific
backbone network after lightweight modifications based results of the ablation experiments are shown in TABLE 6.
on the Ghost Convolution module, ShuffleNetv2, and By comparing the experimental data with the baseline model
MobileNetv3. The detection performance of the models was YOLOv5s, it is evident that the performance of the improved
tested, and the specific experimental results are shown in model has increased.The original YOLOv5 model had a
TABLE 5. precision of 87.5%, a recall of 91.4%, an [email protected] of 91.1%,
By analyzing Table 5, it is evident that the complexity and an [email protected]:.95 of 62.5%. First, we performed single
of the model is significantly reduced after applying three improvements on the model. When the SE attention mech-
different lightweight improvements to the backbone net- anism module was added to the backbone network, the preci-
work. However, the detection performance of the model sion increased by 0.8 percentage points, [email protected] increased
also declines when the network is improved based on by 0.2 percentage points, and [email protected]:0.95 increased
ShuffleNetv2 and MobileNetv3. For example, compared by 0.8 percentage points. After replacing the original
to YOLOv5, the model’s accuracy decreases by 12.3%, up-sampling operator with the Content-Aware Reassembly
recall drops by 14.5%, and [email protected]:.95 is reduced by of Features (CARAFE) up-sampling operator, the precision
increased by 0.9 percentage points, the recall increased by analysis above, it is clear that most performance metrics
0.5 percentage points, [email protected] increased by 0.6 percentage improved after each individual modification to the YOLOv5
points, and [email protected]:0.95 increased by 1.2 percentage model, which demonstrates the effectiveness of our improve-
points. When the CIoU loss function was replaced with the ment methods.
EIoU loss function, the precision reached 88.2%, the recall Secondly, when we added the SE attention mechanism
increased by 0.1 percentage points, and [email protected] reached module and used the Content-Aware Reassembly of Fea-
91.5%. When we introduced the Ghost convolution module tures (CARAFE) up-sampling operator in the backbone
into the backbone network, the precision, [email protected], and network, compared to the original model, the preci-
[email protected]:0.95 all increased by 0.7 percentage points, and sion increased by 2.2%, recall improved by 0.4 per-
the number of parameters and computational load decreased centage points, [email protected] increased by 0.8 percentage
by 26.8% and 32.1%, respectively.From the comprehensive points, and [email protected]:0.95 increased by 1.1%. Building
180 epochs. After 180 epochs, the improved model exceeds 23.9%, respectively. Although the recall rate did not improve,
the baseline model, achieving a [email protected]:0.95 of 64%. the precision increased by 2.5%, and other performance
From the comprehensive comparison above, it is evident that indicators also rose by 1% to 3%. Compared to YOLOv5m,
the performance of the improved model is better than the YOLOv7, and YOLOv8s, the precision of the improved
baseline model YOLOv5s. model increased by 2.7, 2.2, and 2.8 percentage points,
As shown in FIGURE 11, the heatmap of the improved respectively, and [email protected]:.95 increased by 0.9, 1.9, and
model demonstrates better coverage of disease areas after 0.4 percentage points, respectively. The computational load
each improvement. By comparing the baseline model decreased by 77.2%, 89.5%, and 60.9%, respectively, and
YOLOv5s with the improved model, it is evident that the model weight size decreased by 74.3%, 85.3%, and
YOLOv5s, while focusing on disease areas, also pays 50.9%, respectively. Compared to YOLOv9-T, the proposed
attention to background regions. In contrast, the improved model showed 0.6 and 1.6 percentage points higher precision
model proposed in this paper more effectively concentrates and recall, respectively, while reducing the computational
on crop disease areas, significantly reducing attention to load and model weight size by 2.6% and 53.7%. Although
the background. Therefore, the crop disease detection model YOLOv9-S has higher detection precision, its larger model
proposed in this paper can achieve higher detection accuracy. weight and computational load make it unsuitable for
deployment on mobile terminal devices and cannot meet the
F. MODEL PERFORMANCE COMPARISON requirements for real-time detection. Our improved model
As shown in TABLE 7, the performance comparison of reduced the computational load and model weight size by
different algorithms is provided. Figure 13 illustrates the 72.0% and 86.2%, respectively, compared to YOLOv9-S.
comparison of computational load and model weight size. Considering precision, recall, computational load, and model
In this study, we compared mainstream one-stage detection weight size, the model proposed in this study is superior to
models such as YOLOv5, YOLOv7, and YOLOv8. All the other algorithms and can meet the needs of crop pest and
algorithms used default parameters with the epoch set to disease monitoring in the agricultural field.
300 and batch size set to 16. From the comparison, it was As shown in FIGURE 14, the visualization comparison of
found that our improved model achieved a precision of 90.0% detection results between YOLOv5s and the improved model
and an [email protected]:.95 of 64%. Compared to the YOLOv5s is presented. In FIGURE 14a, there are three apple leaves
model, our improved model showed increases of 2.5% with powdery mildew. However, YOLOv5s only detected
and 1.5%, respectively, in these metrics, while reducing two large leaves and failed to detect the smaller leaf, while
computational load and model weight size by 31.5% and the improved model was able to detect powdery mildew on
all leaves. In FIGURE 14b, the improved model correctly [6] C. Wang, Y. Tang, X. Zou, L. Luo, and X. Chen, ‘‘Recognition and
identified a case of strawberry gray mold, whereas YOLOv5s matching of clustered mature litchi fruits using binocular charge-coupled
device (CCD) color cameras,’’ Sensors, vol. 17, no. 11, p. 2564, Nov. 2017.
misidentified a black plastic film as strawberry gray mold [7] L. Luo, W. Liu, Q. Lu, J. Wang, W. Wen, D. Yan, and Y. Tang, ‘‘Grape
along with the actual disease. In FIGURE 14c, there are two berry detection and size measurement based on edge image processing and
cases of leaf spot disease on taro leaves. YOLOv5s identified geometric morphology,’’ Machines, vol. 9, no. 10, p. 233, Oct. 2021.
the leaf spot disease but also mistakenly identified a healthy [8] J. L. Miranda, B. D. Gerardo, and B. T. Tanguilig III, ‘‘Pest detection and
extraction using image processing techniques,’’ Int. J. Comput. Commun.
part as taro leaf being eaten by insects. In FIGURE 14d, Eng., vol. 3, no. 3, pp. 189–192, 2014.
there are three soybean leaves with insect damage. However, [9] J. G. A. Barbedo, L. V. Koenigkan, and T. T. Santos, ‘‘Identifying multiple
YOLOv5s only detected one leaf, while the improved model plant diseases using digital image processing,’’ Biosyst. Eng., vol. 147,
pp. 104–116, Jul. 2016.
detected two leaves. Overall, the analysis indicates that
[10] S. Zhang, X. Wu, Z. You, and L. Zhang, ‘‘Leaf image based cucumber
the performance of the model improved using the methods disease recognition using sparse representation classification,’’ Comput.
proposed in this study is superior to the baseline model. Electron. Agricult., vol. 134, pp. 135–141, Mar. 2017.
[11] V. K. Shrivastava and M. K. Pradhan, ‘‘Rice plant disease classification
IV. CONCLUSION using color features: A machine learning paradigm,’’ J. Plant Pathol.,
vol. 103, no. 1, pp. 17–26, Feb. 2021.
To achieve accurate detection of common crop diseases, this
[12] J. Kianat, M. A. Khan, M. Sharif, T. Akram, A. Rehman, and T. Saba,
study proposes an effective detection algorithm based on an ‘‘A joint framework of feature reduction and robust feature selection
improved YOLOv5s. First, SE attention mechanism modules for cucumber leaf diseases recognition,’’ Optik, vol. 240, Aug. 2021,
were added to the 3rd, 6th, 9th, and 12th layers of the back- Art. no. 166566.
[13] Sumanto, Y. Sugiarti, A. Supriyatna, I. Carolina, R. Amin, and A. Yani,
bone network, allowing the model to fully extract and learn ‘‘Model Naïve Bayes classifiers for detection apple diseases,’’ in Proc.
leaf features. Second, we replaced the original up-sampling 9th Int. Conf. Cyber IT Service Manage. (CITSM), Sep. 2021, pp. 1–4.
operator in the YOLOv5s model with the Content-Aware [14] S. Mukhopadhyay, M. Paul, R. Pal, and D. De, ‘‘Tea leaf disease
Reassembly of Features (CARAFE) up-sampling operator, detection using multi-objective image segmentation,’’ Multimedia Tools
Appl., vol. 80, no. 1, pp. 753–771, Jan. 2021.
which can obtain more accurate up-sampled feature maps, [15] M. Chen, Y. Tang, X. Zou, K. Huang, Z. Huang, H. Zhou, C. Wang,
thereby enhancing the model’s feature extraction in diseased and G. Lian, ‘‘Three-dimensional perception of orchard banana central
areas and correctly distinguishing between different types stock enhanced by adaptive multi-vision technology,’’ Comput. Electron.
Agricult., vol. 174, Jul. 2020, Art. no. 105508.
of crop diseases.Next, we replaced the original YOLOv5
[16] Q. Li, W. Jia, M. Sun, S. Hou, and Y. Zheng, ‘‘A novel green
loss function with the EIoU loss function, further improving apple segmentation algorithm based on ensemble U-Net under complex
the model’s performance. Finally, to reduce the model’s orchard environment,’’ Comput. Electron. Agricult., vol. 180, Jan. 2021,
complexity and meet the deployment needs of edge devices, Art. no. 105900.
[17] X. Cao, H. Yan, Z. Huang, S. Ai, Y. Xu, R. Fu, and X. Zou, ‘‘A multi-
we introduced the Ghost convolution module into the objective particle swarm optimization for trajectory planning of fruit
backbone network. Through experimental comparisons, our picking manipulator,’’ Agronomy, vol. 11, no. 11, p. 2286, Nov. 2021.
model achieved excellent detection performance, surpass- [18] A. Anagnostis, A. C. Tagarakis, G. Asiminari, E. Papageorgiou, D.
ing mainstream models such as YOLOv5s, YOLOv5m, Kateris, D. Moshou, and D. Bochtis, ‘‘A deep learning approach for
anthracnose infected trees classification in walnut orchards,’’ Comput.
YOLOv7, and YOLOv8.The improved model also enhanced Electron. Agricult., vol. 182, Mar. 2021, Art. no. 105998.
detection performance in complex environments, demonstrat- [19] J. Chen, Q. Liu, and L. Gao, ‘‘Visual tea leaf disease recognition using
ing good accuracy in identifying crop diseases. The model a convolutional neural network model,’’ Symmetry, vol. 11, no. 3, p. 343,
Mar. 2019.
was effectively lightweighted, aiding farmers in accurately
[20] F. Jiang, Y. Lu, Y. Chen, D. Cai, and G. Li, ‘‘Image recognition of four
and timely detecting and preventing diseases, thus improving Rice leaf diseases based on deep learning and support vector machine,’’
crop yield and income. However, there is still significant Comput. Electron. Agricult., vol. 179, Dec. 2020, Art. no. 105824.
room for improvement in the detection accuracy of the [21] A. Abbas, S. Jain, M. Gour, and S. Vankudothu, ‘‘Tomato plant disease
detection using transfer learning with C-GAN synthetic images,’’ Comput.
proposed model. In future work, we will collect more image Electron. Agricult., vol. 187, Aug. 2021, Art. no. 106279.
data and expand the dataset to further improve the detection [22] S. Xiang, Q. Liang, W. Sun, D. Zhang, and Y. Wang, ‘‘L-CSMS: Novel
accuracy of crop diseases and pests. lightweight network for plant disease severity recognition,’’ J. Plant
Diseases Protection, vol. 128, no. 2, pp. 557–569, Apr. 2021.
REFERENCES [23] L. Tan, J. Lu, and H. Jiang, ‘‘Tomato leaf diseases classification based on
leaf images: A comparison between classical machine learning and deep
[1] G. Lin, Y. Tang, X. Zou, J. Xiong, and Y. Fang, ‘‘Color-, depth-, and
learning methods,’’ AgriEngineering, vol. 3, no. 3, pp. 542–558, 2021.
shape-based 3D fruit detection,’’ Precis. Agricult., vol. 21, no. 1, pp. 1–17,
Feb. 2020. [24] L. Jiao, F. Zhang, F. Liu, S. Yang, L. Li, Z. Feng, and R. Qu,
[2] S. Liu, D. Liu, G. Srivastava, D. Połap, and M. Woźniak, ‘‘Overview and ‘‘A survey of deep learning-based object detection,’’ IEEE Access, vol. 7,
methods of correlation filter algorithms in object tracking,’’ Complex Intell. pp. 128837–128868, 2019.
Syst., vol. 7, pp. 1895–1917, Jun. 2020. [25] X. Ma, K. Ji, B. Xiong, L. Zhang, S. Feng, and G. Kuang, ‘‘Light-
[3] Y. Tang, M. Chen, C. Wang, L. Luo, J. Li, G. Lian, and X. Zou, ‘‘Recog- YOLOv4: An edge-device oriented target detection method for remote
nition and localization methods for vision-based fruit picking robots: A sensing images,’’ IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,
review,’’ Frontiers Plant Sci., vol. 11, May 2020, Art. no. 520170. vol. 14, pp. 10808–10820, 2021.
[4] J. Li, Y. Tang, X. Zou, G. Lin, and H. Wang, ‘‘Detection of fruit-bearing [26] C. H. Son, ‘‘Leaf spot attention networks based on spot feature encoding
branches and localization of litchi clusters for vision-based harvesting for leaf disease identification and detection,’’ Appl. Sci., vol. 11, no. 17,
robots,’’ IEEE Access, vol. 8, pp. 117746–117758, 2020. p. 7960, 2021.
[5] F. Wu and J. Duan, ‘‘Multi-target recognition of bananas and automatic [27] J. Li, Y. Qiao, S. Liu, J. Zhang, Z. Yang, and M. Wang, ‘‘An improved
positioning for the inflorescence axis cutting point,’’ Frontiers Plant Sci., YOLOv5-based vegetable disease detection method,’’ Comput. Electron.
vol. 12, Nov. 2021, Art. no. 705021. Agricult., vol. 202, Nov. 2022, Art. no. 107345.
[28] D. Liu, F. Lv, J. Guo, H. Zhang, and L. Zhu, ‘‘Detection of forestry pests LIJUAN ZHENG received the B.S. degree in
based on improved YOLOv5 and transfer learning,’’ Forests, vol. 14, no. 7, transportation planning and management from
p. 1484, Jul. 2023. Tongji University, in 2008. She is currently a
[29] B. Zhang, R. Wang, H. Zhang, C. Yin, Y. Xia, M. Fu, and W. Fu, ‘‘Dragon Lecturer in mechanical design, manufacturing
fruit detection in natural orchard environment by integrating lightweight and automation with Zhejiang Normal University,
network and attention mechanism,’’ Frontiers Plant Sci., vol. 13, Oct. 2022, China. Her research interests include intelligent
Art. no. 1040923. transportation, intelligent control, mechanical and
[30] F. Xiao, H. Wang, Y. Xu, and Z. Shi, ‘‘A lightweight detection method
electrical control, deep learning, and networking.
for blueberry fruit maturity based on an improved YOLOv5 algorithm,’’
Agriculture, vol. 14, no. 1, p. 36, Dec. 2023.
[31] W. Yang and X. Qiu, ‘‘A novel crop pest detection model based on
YOLOv5,’’ Agriculture, vol. 14, no. 2, p. 275, Feb. 2024.
[32] J. Hu, L. Shen, and G. Sun, ‘‘Squeeze-and-excitation networks,’’ in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
pp. 7132–7141.
[33] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, ‘‘GhostNet: More
features from cheap operations,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2020, pp. 1577–1586.
[34] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon., ‘‘CBAM: Convolutional
block attention module,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
pp. 3–19.
[35] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, ‘‘ECA-Net:
Efficient channel attention for deep convolutional neural networks,’’ in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 11531–11539.
[36] L. Yang, R.-Y. Zhang, L. Li, and X. Xi, ‘‘Simam: A simple, parameter-free
attention module for convolutional neural networks,’’ in Proc. Int. Conf.
Mach. Learn., 2021, pp. 11863–11874.
[37] Y. Liu, Z. Shao, Y. Teng, and N. Hoffmann, ‘‘NAM: Normalization-based
attention module,’’ 2021, arXiv:2111.12419.
JIPING XIONG received the B.S. degree in
electronics and communication engineering and
the Ph.D. degree in communication and infor-
mation system from the University of Science
WEI CHEN received the bachelor’s degree in and Technology of China, in 2001 and 2006,
electrical engineering from Hangzhou Dianzi respectively. From 2012 to 2013, he was a Visiting
University, in 2022. He is currently pursuing the Scholar with the Department of Computer Science
master’s degree in electronic information with and Engineering, University of Minnesota, Min-
Zhejiang Normal University, China. His research neapolis, MN, USA. He is currently an Associate
interest includes object detection. Professor of computer engineering with Zhejiang
Normal University, China. His research interests include deep learning,
compressive sensing, computer vision, networking, information security, and
signal processing.