2. Early real-time detection algorithm of tomato diseases and pests in the natural environment
2. Early real-time detection algorithm of tomato diseases and pests in the natural environment
Abstract
Background: Research on early object detection methods of crop diseases and pests in the natural environment
has been an important research direction in the fields of computer vision, complex image processing and machine
learning. Because of the complexity of the early images of tomato diseases and pests in the natural environment, the
traditional methods can not achieve real-time and accurate detection.
Results: Aiming at the complex background of early period of tomato diseases and pests image objects in the natu-
ral environment, an improved object detection algorithm based on YOLOv3 for early real-time detection of tomato
diseases and pests was proposed. Firstly, aiming at the complex background of tomato diseases and pests images
under natural conditions, dilated convolution layer is used to replace convolution layer in backbone network to main-
tain high resolution and receptive field and improve the ability of small object detection. Secondly, in the detection
network, according to the size of candidate box intersection ratio (IOU) and linear attenuation confidence score pre-
dicted by multiple grids, the obscured objects of tomato diseases and pests are retained, and the detection problem
of mutual obscure objects of tomato diseases and pests is solved. Thirdly, to reduce the model volume and reduce
the model parameters, the network is lightweight by using the idea of convolution factorization. Finally, by introduc-
ing a balance factor, the small object weight in the loss function is optimized. The test results of nine common tomato
diseases and pests under six different background conditions are statistically analyzed. The proposed method has a
F1 value of 94.77%, an AP value of 91.81%, a false detection rate of only 2.1%, and a detection time of only 55 Ms. The
test results show that the method is suitable for early detection of tomato diseases and pests using large-scale video
images collected by the agricultural Internet of Things.
Conclusions: At present, most of the object detection of diseases and pests based on computer vision needs to
be carried out in a specific environment (such as picking the leaves of diseases and pests and placing them in the
environment with light supplement equipment, so as to achieve the best environment). For the images taken by the
Internet of things monitoring camera in the field, due to various factors such as light intensity, weather change, etc.,
the images are very different, the existing methods cannot work reliably. The proposed method has been applied to
the actual tomato production scenarios, showing good detection performance. The experimental results show that
the method in this study improves the detection effect of small objects and leaves occlusion, and the recognition
*Correspondence: [email protected]
1
Shandong Provincial University Laboratory for Protected Horticulture,
Blockchain Laboratory of Agricultural Vegetables, Weifang University
of Science and Technology, Weifang 262700, Shandong, China
Full list of author information is available at the end of the article
© The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/
zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Wang et al. Plant Methods (2021) 17:43 Page 2 of 17
effect under different background conditions is better than the existing object detection algorithms. The results show
that the method is feasible to detect tomato diseases and pests in the natural environment.
Keywords: Real-time detection algorithm, Deep learning, Dilated convolution, NMS, YOLOv3, Tomato diseases and
pests, Natural environment
Background but also increases the resistance of pests and the cost of
Agriculture is an important foundation for economic control, resulting in serious negative effects of pesticide
development. At present, agricultural output is largely control [17, 18]. Therefore, if the real-time diseases and
limited by the disasters caused by plant growth disorders, pests detection and early recognition and early warn-
which are characterized by great variety, large impact ing can be carried out for tomatoes, the occurrence of
and easy transmission, and cause significant losses to tomato plant growth disorders can be identified timely
agricultural production. Modern agricultural production and accurately, and the producers can be guided accu-
methods are gradually moving closer to automation of rately to carry out plant protection and control, and the
unmanned machines, such as automatic irrigation tech- plant growth disorders can be controlled in the early
nology in farmland [1], combined harvesting [2] and agri- stage of occurrence, the production goal of improving
cultural robots [3] and other equipment to improve the tomato yield and quality can be achieved, and the pes-
efficiency and output of production operations. With the ticide spray can be reduced. Consequently, it is neces-
development of science and technology and the improve- sary to study the characteristics of tomato diseases and
ment of life, artificial intelligence has gradually entered pests, and to recognize and judge the disasters as early
human life, and the application based on machine vision as possible.
algorithm has been widely used in life, such as vehi- Traditional identification and early warning judg-
cle detection [4, 5], pedestrian detection [6, 7], safety ment of tomato diseases and pests are mainly based
production [8, 9] and fruit detection [10, 11] and other on field survey by plant protection experts and experi-
practical applications, They have replaced the traditional enced tomato producers according to the growth sta-
manual operation and greatly improved the production tus of tomato [19]. Only relying on human resources
efficiency. Non-destructive detection and early identifica- to identify the information of plant growth disorders is
tion of plant growth disorders are key to the development labour-intensive, time-consuming and slow, and some
of precision agriculture and ecological agriculture. Early deviations may occur, thus hindering the timely treat-
detection and prevention can effectively slow down the ment of diseases and pests [20]. Therefore, today’s agri-
spread of plant growth disorders. Thus, it needs to adopt cultural production urgently needs a new system to
appropriate algorithm for accurate detection. liberate producers from inefficient, complex diseases
Among many agricultural products, tomato is the and pests identification processes.
only vegetable that can be eaten as fruit, and its nutri- In recent years, due to the rapid development of deep
tional value is much higher than that of fruit [12]. Its learning related theories and the improvement of com-
yield is high, and its planting area is growing, espe- putational ability, deep convolutional networks have
cially in greenhouses where the planting area increases achieved great success in computer vision. In terms of
rapidly [13]. However, tomato is susceptible to dis- object detection, the accuracy of deep learning-based
eases and pests during their growth, which seriously methods greatly exceeds that of traditional methods
affects its yield and quality and causes enormous eco- based on manual design features such as HOG and SIFT
nomic losses to farmers [14]. The outbreak is uncer- [21]. Object detection is to draw a range of objects of
tain due to the diversity of tomato diseases and pests interest in an image, then select the target box with a rec-
[15], the types of chemical sprays are diverse and the tangle and label it with a category. Deep learning-based
cost of treatment is often higher than that of preven- object detection mainly includes two types, one is con-
tion. According to the actual investigation of growers, volutional network structure based on region generation,
as for the tomato planting base in Shouguang, Shan- and the representative network are R-CNN [22], faster
dong Province, during the production period of one R-CNN [23]; the other is to treat the detection of the
season tomato, up to 5 kinds of agricultural chemicals object location as a regression problem, directly using
are sprayed, the number of times of spraying was up the CNN network structure to process the entire image,
to 10, and the chemical pesticide used for diseases and and at the same time to predict the category and location
pests control was up to 1000 tons [16]. Pesticide abuse of the object, the representative networks are: SSD [24],
not only destroys the ecological balance of farmland, YOLO [25], YOLO9000 [26], YOLOv3 [27].
Wang et al. Plant Methods (2021) 17:43 Page 3 of 17
At present, most studies focus on image classification detection of tomato diseases and pests under natural
of crop diseases and pests based on deep learning. There conditions. Firstly, in view of the complex background of
are few studies on object detection of crop diseases and tomato diseases and pests images under natural condi-
pests based on deep learning [28–31]. The existing object tions, a dilated convolution layer [40] was used to replace
detection methods of tomato diseases and pests based on the convolution layer in the backbone network, so that
deep learning mostly use the algorithm based on region it can maintain high resolution and receptive field, and
suggestion for detection [28, 32–35], the accuracy has improve the ability of small size object detection. Sec-
greatly improved compared with traditional methods, ondly, in the detection network, the non-maximum sup-
but the object detection process takes a long time and it pression (NMS) algorithm [41] is used according to the
is difficult to detect and locate tomato diseases and pests size of the intersection union ratio ( IoU ) of candidate
in real-time under natural conditions. boxes predicted by multiple grids and the linear attenu-
It is now possible to collect images of diseases and ation confidence score, remove the predictive boxes
pests in real-time using the Internet of Things and video with larger IoU and lower confidence scores, the predic-
camera equipment in tomato greenhouses, then trans- tion box with higher confidence score is retained as the
mit them to remote computers through the Internet, and object detection box. The obscured tomato diseases and
then use computers to automatically identify the types pests objects are retained to solve the detection problem
of diseases and pests. However, the recognition result of mutual obscure tomato diseases and pests objects.
is closely related to the image quality acquired. Disease Thirdly, to reduce the model volume and reduce the
acquisition is affected by the quality of the camera, light, model parameters, the network is lightweight by using
shooting level, etc. If the lesion is too small, the image is the idea of convolution factorization. Finally, aiming at
blurred, there are spots, shadows, etc., it will seriously the missed detection of small size tomato diseases and
affect the correct recognition. Due to the complex diver- pests objects in the detection process, a loss function
sity of tomato diseases and pests images under real natu- improvement method based on balance factor is pro-
ral conditions, especially the image of diseases and pests posed to balance the difficulty of samples to obtain better
collected by Internet of Things video equipment is mas- results.
sive, highly redundant and noisy, the feature extraction
ability of existing methods is not suitable for tomato dis- Materials and methods
eases and pests detection under natural conditions. Pre- Dataset used in the research
vious studies have shown high accuracy under controlled To verify the validity of the object detection method for
laboratory conditions, but in complex light and complex tomato diseases and pests proposed in this study and
background, the object detection results are not ideal to ensure timely detection in the early stage of diseases
and face many challenges [36]. Our previous work has and pests occurrence, the growth status of tomato was
achieved good results on the detection of a common gray monitored in real-time by using the video monitoring
leaf spot disease of tomato under natural conditions [37]. system in the tomato greenhouse in Shouguang City,
Chen et al. [38] collected 8616 images containing five Shandong Province. The greenhouse dimensions are 8 m
kinds of tomato diseases on the spot. The images were north–south span and 80 m east–west length. The video
denoised and enhanced by combining the binary wavelet monitoring system is shown in Fig. 1 (The website of the
transform of Retinex (BWTR). The two-channel residual system is: http://139.224.3.180/farming/loginController.
attention network model (B-ARNet) was used to iden- do?login).
tify the images with an accuracy of about 89%. Pattnaik
et al. [39] proposed a pre-trained deep CNN framework Hardware environment
for transfer learning for pest classification in tomato The video monitoring system consists of three network
plants, and achieved the highest classification accuracy cameras, two switches, two wireless sending and receiv-
of 88.83% using DenseNet 169 model. However, in actual ing network bridges, one network video recorder, one
production, tomato may suffer from a variety of diseases streaming server and one wireless router, one central
or pests at the same time. management server (containing intelligent analysis mod-
To solve the above problems, this study takes 9 com- ule), one video storage server, several network lines, and
mon diseases and pests of tomato as the research object, one mobile phone for testing.
uses machine vision object detection method for dis-
eases and pests detection, and based on the deep learn- Software environment
ing YOLOv3 object detection algorithm, through the Real-time video monitoring module. This module is pri-
improvement of YOLOv3 algorithm to obtain a bet- marily designed to provide users with live video infor-
ter detection network model, which achieves the early mation, where users can view real-time live video and
Wang et al. Plant Methods (2021) 17:43 Page 4 of 17
historical video data. Video acquisition module. This information, including various background interfer-
module is the core module of the system, which provides ence information such as leaves, weeds and soil, which
real-time intelligent acquisition of monitoring video data. is suitable for the mobile end extended application and
The video monitoring camera can perform 23× opti- can be deployed in tomato greenhouses.
cal zooms. The growth status of single leaf and single In most of the images initially collected, the object
fruit can be observed within 500 m, which is convenient subject to be studied, i.e. the lesion part in the image,
for the manager to observe tomato growth remotely. To only accounts for a small part of the whole image. To
reduce the workload of video acquisition, we try to col- reduce the amount of data in post-processing, improve
lect in high attack period of disease, take videos at multi- the processing efficiency and eliminate the interfer-
ple angles, and the data can be queried and downloaded ence caused by non-subject parts as far as possible, the
at any time. redundant parts were removed by image clipping only
From March to May of 2019 and 2020, tomato dis- retaining the main part of the study. In image clipping,
eases and pests images were collected by surveillance the tool used was Adobe Photoshop CS6 × 64. To make
video. The resolution was 1960 × 1080 pixels. The focus the size of all kinds of tomato diseases and pests images
was on tomato leaf diseases and pests images. The mon- in the dataset consistent, a function cv2.resize() in
itoring includes tomato growth status in different peri- Open CV was called through normalization operation,
ods, different locations and different weather. At each and the size of the images was unified to 256 × 256 size.
video monitoring point, when the computer judges that Because the length–width ratios of the disease images
there were suspected lesions in the tomato growth pro- are close, actual shapes of lesions or pests will not get
cess, the videos were detected and tracked every day, altered after the uniform size.
then the images of the best leaf postures were captured Similar sample images were filtered out by manual
and saved in JPEG format. To determine the best leaf screening, and some invalid data was removed. The early
postures, the leaves with the gross good postures were images of 7 common diseases and 2 pests of tomato were
manually identified first, the dimensions of the bound- selected, and a total of 10,696 images were captured. The
ing box of this class of leaves were manually scaled, and images are randomly divided into training dataset, valida-
K-means clustering was performed on the aspect ratio tion dataset and test dataset according to the proportion
of the bounding box, with the cluster centre as the opti- of 70%, 20% and 10%. The early detection experiments of
mal length and width of the tomato leaf. Cluster cen- diseases and pests can be carried out.
tre was found to be 1: 0.79 by experiment, so we finally The images containing the characteristics of tomato
determined the best leaf postures when the length to diseases and pests were annotated by labelImg. The
width ratio of the detected leaf bounding box was equal labelImg software is an image annotation tool for deep
to or close to 1: 0.79. Compared with the traditional learning dataset based on Python language, which is
manual photography data acquisition method, this mainly used to record the category name and position
acquisition method is easy to collect tomato diseases information of the object, and store the information in
and pests images rich in complex natural environment the Extensible Markup Language (XML) format file.
Wang et al. Plant Methods (2021) 17:43 Page 5 of 17
it is necessary to focus on the detailed information and preferred algorithm for many object detection tasks
multi-scale features of tomato diseases and pests objects. because of its simple implementation. However, as a sin-
gle object detection network, there are still problems of
Principle of the improved YOLOv3 model large positioning error, unbalanced prospects and back-
The features of tomato diseases and pest objects in the ground complexity.
image are extracted using deep convolution neural net- To pursue detection speed, YOLOv3 algorithm inte-
work, and the detection and localization of the objects grates object location and classification into a convo-
are achieved by regression. The main flow of the algo- lutional neural network, and simultaneously predicts
rithm is shown in Fig. 3. Firstly, feature extraction net- the location coordinates and class information of the
work is constructed by residual module to obtain image object. However, because the deep feature map in the
object feature pyramids; then, features of different depths convolutional neural network contains more advanced
are fused by feature fusion mechanism, and the location and abstract feature information, which is suitable for
information of the object is predicted by regression on object classification, but because of the loss of more spa-
the fused feature map, and the confidence score is pre- tial information, it has poor effect on object localization.
dicted by Sigmiod function; finally, the output is filtered Shallow feature maps are more specific and contain more
by NMS (non-maximum suppression). spatial information, which is suitable for coordinate posi-
tioning but not ideal for object classification. Although
Problems of YOLOv3 model YOLOv3 tries to use the concatenation of deep feature
YOLO series object detection network is the most rep- map and shallow feature map to fuse different levels of
resentative network structure in one stage object detec- feature information, there is still the problem of inac-
tion network. YOLOv3 is the latest improved network of curate object location compared with two-stage object
YOLO series because the detection accuracy can be com- detection algorithms.
parable to two target detection networks and can achieve
real-time detection speed, so it has become one of the The improved backbone network
most popular object detection algorithms. Considering YOLOv3 uses Darknet-53 as a feature extraction network
that the object detection of tomato diseases and pests and achieves good object detection results on common
needs to take into account both the accuracy and speed datasets. Compared with the objects in common data-
of detection in practical application, this study takes sets, tomato diseases and pests have smaller objects, and
YOLOv3 as the main body and improves the algorithm multi-object aggregation often occurs. The background
according to the application scenario of tomato diseases in the natural environment is complex and seriously
and pests object detection to complete the location and affected by light conditions. It is difficult to extract sig-
class identification of tomato diseases and pests. nificant features from images.
YOLOv3 network, which has been improved many The residual network used by Darknet-53 solves the
times, has achieved a good balance between detec- problem of gradient disappearance during propaga-
tion accuracy and detection speed and has become the tion. However, each residual unit contains only two
convolution layers, which limits the capacity of the unit reduced, compared with the original YOLOv3 backbone
to some extent. Simply increasing the width or depth network, the improved YOLOv3 uses FPN structure to
of each unit saturates the network performance [42]. reduce the loss of semantic features of the small-scale
According to Szegedy et al. [43], the improvement of net- object in deep network, which is conducive to identifying
work performance is related to the diversity of network smaller objects.
structure, not only increasing the depth or width of the The dilated convolution bottleneck layer is introduced
network. In this study, from the perspective of structural as shown in Fig. 6a. Dilated convolution bottleneck with
diversity, the backbone network was redesigned accord- 1 × 1 Conv projection is shown in Fig. 6b. In these two
ing to the characteristics of tomato diseases and pests kinds of dilated convolution residual structures with
objects. For the detection of small objects, on the one lower complexity, Conv is convolutional layer; Add is
hand, high-resolution feature maps are needed to detect addition operation, ReLU is the activation function. The
the object information in a small area; on the other hand, receptive filed of the convolution kernel of the dilated
a wider receptive field or more global information is convolution is 3 × 3 and rrate = 2. Therefore, the recep-
needed to accurately determine the location and seman- tive field and feature expression ability of the backbone
tic features of the objects. To improve the detection of network are increased as a whole. Meanwhile, the dilated
small objects in tomato diseases and pests images, a convolution residual structure still has the advantages
backbone network with high resolution and large recep- of fewer network parameters of residual units and lower
tive fields is proposed for feature extraction by combining computational complexity. Figure 6b uses 1 × 1 Conv to
dilated convolution and FPN (feature pyramid) structure. achieve cross-channel feature fusion, which integrates
Dilated convolution enlarges the receptive field of con- feature information well.
volution kernel by changing the internal spacing of con- Finally, the improved YOLOv3 can maintain a higher
volution kernel. Figure 4 shows three kinds of dilated resolution and a larger receptive field of the feature map
convolution kernels with different intervals. rrate repre- in the deep convolutional neural network, enhancing the
sents the interval in the convolution kernels. Figure 4a receptive field and detection ability of the YOLOv3 algo-
represents the receptive filed of the convolution kernel rithm for small objects.
is 3 × 3 and rrate = 1. Figure 4b represents the receptive
filed of the convolution kernel is 7 × 7 and rrate = 2. Fig- Optimized linear attenuation NMS
ure 4c represents the receptive filed of the convolution The occlusion problem is one of the main factors that
kernel is 15 × 15 and rrate = 3. In this way, it can ensure restrict the improvement of detection accuracy. Because
that the convolutional neural network can extract the fea- of the close distance between the objects to be detected,
ture information in a larger receptive field. missed detection or false detection occurs easily. During
The backbone network of the improved YOLOv3 is the test, the detection algorithm generates a set of can-
shown in Fig. 5. The resolution of features will directly didate boxes around each suspected object. If the occlu-
affect the detection of small objects and the overall per- sion phenomenon is absent, NMS is performed on this
formance indicators. Low resolution will lead to the loss set of candidate boxes. Redundant candidate boxes can
of semantic features of small objects seriously, and high- be effectively filtered out, resulting in the final highest-
resolution features will cause a large amount of computa- scoring predicted box. However, when two or more
tion and memory storage. Therefore, considering that the objects occlude each other, it results in the final set of
overall performance of backbone networks will not be several candidate boxes being fused into one group, at
Fig. 4 Dilated convolution kernels with different intervals a rrate = 1; b rrate = 2; c rrate = 3
Wang et al. Plant Methods (2021) 17:43 Page 8 of 17
which point the algorithm cannot tell whether the can- In this study, a linear attenuation NMS algorithm is
didate box comes from the same object or several differ- used to solve the problem that the occluded leaves could
ent objects, which will lead to missed detection or false not be accurately detected due to NMS in the original
detection. To improve the precision of detection in the YOLOv3. When the IoU is higher than the suppression
context of occlusion, we should try to make candidate threshold, the confidence score in the NMS is linearly
boxes generated by different objects distinguishable and smoothed. The optimized NMS algorithm is expressed as
screen them individually.
As shown in Fig. 7a, b are two images from tomato ∗ Sconfi , IoU (M, bi ) ≤ Nt
Sconfi =
diseases and pests dataset. The confidence score of the Sconfi [1 − IoU (M, bi )], IoU (M, bi ) > Nt
A-leaf prediction box is 0.8, the B-leaf prediction box is (1)
0.6, and the A-leaf obscures the B-leaf. The intersection In the above-mentioned formula, Sconfi ∗ is the confi-
over the union of the prediction boxes of A and B leaves dence score after linear smoothing treatment, Sconfi is
is IoU > 0.5. The NMS algorithm is used to process the confidence score of the original NMS, M is the pre-
redundant prediction boxes. When IoU > 0.5, because diction box with the higher confidence score, bi is the
the set threshold of YOLOv3 is 0.5, the A-leaf with higher object prediction box to be compared, IoU (M, bi ) is the
confidence score is retained, and the confidence score of intersection over union of M and bi, Nt is the suppres-
the prediction box of B-leaf is set to 0, thus the result of sion threshhold.
B-leaf cannot be detected.
Wang et al. Plant Methods (2021) 17:43 Page 9 of 17
Fig. 6 Structure of dilated convolution residuals. a Dilated convolution bottleneck; b dilated convolution bottleneck with 1 × 1Conv projection
The flow chart of the optimized NMS algorithm is Lightweight processing of the model
shown in Fig. 8. The improved network increases the diversity of struc-
The specific steps are as follows: tures and the parameters in the network become more
numerous. When the dataset is small, too many network
1) According to the size of the confidence score, the N parameters will lead to over-fitting problems, which can
prediction boxes generated by regression are sequen- not make the model have good generalization ability but
tially sorted; also increase the computational difficulty and make the
2) Select the prediction box with the largest confidence network difficult to train. To solve the above problems,
score and calculate the IoU values with other predic- the model is lightweight processed in this study. The
tion boxes; idea of convolution factorization is introduced into the
Wang et al. Plant Methods (2021) 17:43 Page 10 of 17
Original network model 4.05 × 108 In the above formula, c is the category of the detected
The new network model 7.92 × 10 8 object, pi (c) is when the i grid detects an object, the pre-
The new network model with convolution factori- 4.36 × 108 diction probability of the object belonging to the category
zation c, pi (c) is when the i grid detects an object, the actual
probability of the object belonging to the category c.
The loss function of coordinate prediction and confi-
dence prediction is to ensure the accuracy of bounding
The improved loss function
box regression. After the adaptive dimension clustering
The two sub-tasks of object detection are bounding
of anchor bounding box, the accuracy of bounding box
box prediction and category prediction. To accomplish
regression has been improved correspondingly. Another
these two sub-tasks, the original YOLOv3 algorithm’s
sub task, the accuracy of category prediction becomes
loss function design includes three parts, namely coor-
more important.
dinate prediction, confidence prediction and category
It is found that the average accuracies of tomato dis-
prediction.
eases and pests objects are different. Because of the vari-
The loss function of the object detection network of
ety of pest forms and the small size of the early objects,
YOLOv3 is shown in Formula (3).
there will be many kinds of attitudes when they gather
Loss = Losscoord + Lossobj + Lossclass (3) and overlap each other. Therefore, unlike diseases, it is
difficult to learn the characteristics of pests, and it is easy
In the above formula, Losscoord is coordinate pre- to miss the detection of small-scale objects. At the same
diction loss, Lossobj is confidence prediction loss and time, different classes of diseases and pests have different
Lossclass is category prediction loss. lesion sizes. To narrow the gap between them, this study
2
S
B adds a balance factor to each category and weighs the dif-
obj
Losscoord = coord lij (xi − x̂i )2 + (yi − ŷi )2 ficulty of samples among different categories. The modi-
i=0 j=0
fied loss function of category prediction is as follows:
S
B 2
obj √
+ coord lij ( wi − ŵi )2 + ( hi − ĥi )2 S 2
′ obj
i=0 j=0
Lossclass = lij αc∈class (pi (c) − p̂i (c))2 (7)
(4)
i=0 c∈class
In the above formula, coord is weight coefficient, S is 2
the number of grids of the input image, B is the number In the above formulas, by adjusting the balance factor
obj
of bounding boxes predicted by a single grid, lij repre- αc∈class, it makes the model find the best point between
sents when the grid i predict the bounding box j and an the bounding box prediction and category prediction. It
object is detected, the value is set to 1. Otherwise 0, xi is makes the algorithm get the best detection effect.
the abscissa of the centre point of the predicted bound- The final improved loss function is as follows:
ing box, yi is the ordinate of the centre point of the pre- ′ ′
Loss = Losscoord + Lossobj + Lossclass
dicted bounding box, xi is the abscissa of the centre point 2
S
B
of the actual bounding box, yi is the ordinate of the cen- = coord
obj
lij (xi − x̂i )2 + (yi − ŷi )2
tre point of the actual bounding box,wi is the width of the i=0 j=0
predicted bounding box, hi is the height of the predicted S
B 2
√
bounding box, w i is the width of the actual bounding box, + coord
obj
lij ( wi − ŵi )2 + ( hi − ĥi )2
hi is the height of the actual bounding box. i=0 j=0
S2
B 2
S
B
2 2 obj obj
B
S B
S + lij (Ci − Ĉi )2 + noobj lij (Ci − Ĉi )2
obj obj
Lossobj = lij (Ci − Ĉi )2 + noobj lij (Ci − Ĉi )2 i=0 j=0 i=0 j=0
i=0 j=0 i=0 j=0 S 2
(5) +
obj
lij αc∈class (pi (c) − p̂i (c))2
In the above formula, Ci is the predicted confidence i=0 c∈class
(8)
score of the object, C
is the actual confidence score of
the object.
Wang et al. Plant Methods (2021) 17:43 Page 12 of 17
TP Model training
R= (10)
TP + FN The original YOLOv3 and the improved YOLOv3 are
trained separately. The initial learning rate is set to 0.001
2PR and the attenuation coefficient is 0.0005 in the train-
F1 = (11) ing phase. When the number of training iterations is
P+R
2000 and 25,000, the learning rate is reduced to 0.0001
1
and 0.00001, respectively, which further converges the
AP = ∫ P(R)dR (12) loss function. The convergence curve of the loss value in
0 the training process of the improved YOLOv3 network
In the above-mentioned formula, P is the accuracy is shown in Fig. 10a. The Avg IOU curve of the object
rate, R is the recall rate. TP is the number of true posi- bounding box and ground truth is shown in Fig. 10b.
tive samples. FP is the number of false-positive sam- After about 30,000 iterations, the parameters tend to be
ples. FN is the number of false-negative samples. stable, the final loss value drops to about 0.2, the Avg IOU
gradually approaches 1, and finally stabilizes at about
0.85. From the convergence analysis of the parameters,
it can be seen that the training results of the improved
YOLOv3 network are ideal.
experiment and verify the model effect on different missing detection rate of 7.7%. In this study, the algo-
datasets. rithm extracts feature at each level and can accurately
Table 4 compares the test performance of this research detect tomato diseases and pests objects at different
method and various detection methods on the self-estab- scales, with a missing detection rate of only 2.1%.
lished tomato diseases and pests dataset. The F1 score In terms of detection time, compared with the tradi-
and Average precision of this method are 94.77% and tional HOG + SVM detection method and Faster-RCNN,
91.81% respectively, which are 30.32% and 31.26% higher the method in this study has greatly improved, which
than the traditional HOG + SVM detection method, the is because YOLO, SSD series algorithms regard object
reason is that the detection method can extract deep detection as a regression problem and improve the detec-
features benefiting from deep learning. Compared with tion speed. The detection speed of this method is gener-
Faster-RCNN, the detection accuracy of this method ally consistent with that of YOLOv3. The main reason
is also improved by 5.32%. The main reason is that this is that although this method uses dilated convolution
method uses anchor mechanism, FPN structure and (which is more time-consuming than ordinary convolu-
improves the object detection accuracy of the network. tion), resulting in an increase in the calculation amount
The detection accuracy of this method is improved of the detection network, the lightweight processing of
by 6.3% and 4.19% compared with SSD and YOLOv3, the model is carried out. Thus, ensuring the detection
respectively, indicating that the backbone network of this speed and meeting the real-time requirements of Tomato
method can maintain high resolution and open sensing diseases and pests detection.
receptive field, which is conducive to improving the accu-
racy of tomato diseases and pests detection. Comparison of different backgrounds of objects
In terms of missing detection rate, the HOG + SVM The different backgrounds of objects can affect the detec-
method is accurate for object location of tomato dis- tion accuracy of the model greatly. Therefore, different
eases and pests at normal scale, but misses detection backgrounds of the objects are taken as a control variable
for small-scale object detection. Faster-RCNN and SSD in this study. The improved YOLOv3 model is used by
are not strong enough in characterizing feature maps the network model. Different backgrounds of test dataset
extracted from shallow layer to cope with multi-scale verify the test results, as shown in Table 6.
detection, so they have a high rate of missing detection To recognize disease under the background of suf-
in multi-scale positioning. Although YOLOv3 can detect ficient light without leaf occlusion, the F1 score of the
multi-scale objects, it still misses small objects with a model can reach 95.22%, the AP value can reach 92.67%
and the average IoU can reach 90.89%. Table 5 shows that
the detection accuracy is slightly low for recognizing dis-
Table 4 Comparison of different detection methods ease under the background of insufficient light with leaf
occlusion and shadow with leaf occlusion. The reason is
Detection F1 score (%) Average Missing rate (%) Time (ms) that the backgrounds have elements that mimic certain
methods precision
(%) disease characteristics, considering the actual application
scenario. Thus, the network may learn them, which influ-
HOG + SVM 64.45 60.55 31.7 7497 ences the recognition effect. The P–R curve of the whole
Faster-RCNN 89.04 86.49 14.5 4459 test set is shown in Fig. 11.
SSD 88.45 85.51 16.9 447
YOLOv3 91.43 87.62 7.7 54 Comparison of different class of tomato diseases and pests
The proposed 94.77 91.81 2.1 55 The detection effect of each class of tomato diseases and
method
pests can be analysed by Table 6.
Table 7 Statistical result of the proposed model on the unedited difficult to achieve satisfactory results in the natural envi-
videos ronment. To overcome the influence of the diversity of
Unedited videos Precision (%) Recall (%) diseases and pests, changes in light and leaf occlusion on
the detection accuracy, this study proposes an improved
Video1 89.95 81.08 YOLOv3 detection algorithm, which improves the back-
Video2 89.77 81.77 bone network, NMS algorithm and loss function, thereby
Video3 90.03 82.13 improving the recognition ability of diseases and pests.
Video4 90.69 82.44 The results show that the average recognition accuracy of
Video5 90.02 81.98 this method is 91.81%. Also, the algorithm can run suc-
Average 90.09 81.88 cessfully on the unedited videos in real natural scenario.
Wang et al. Plant Methods (2021) 17:43 Page 16 of 17
Funding
Future directions This study was supported by the Facility Horticulture Laboratory of Universi-
This study mainly studied the early detection of tomato ties in Shandong with project numbers 2019YY003, 2018YY016, 2018YY043
and 2018YY044; school level High-level Talents Project 2018RC002; Youth
diseases and pests under natural conditions, which can Fund Project of Philosophy and Social Sciences of Weifang College of Science
meet the accuracy and speed requirements of tomato and Technology with project numbers 2018WKRQZ008 and 2018WKRQZ008-
diseases and pests detection as a whole. However, some 3; Key research and development plan of Shandong Province with project
number 2020RKA07036, 2019RKA07012 and 2019GNC106034; Research and
problems still need to be solved urgently. Development Plan of Applied Technology in Shouguang with project number
2018JH12; 2018 innovation fund of Science and Technology Development
1) Based on tomato diseases and pests, it is necessary centre of the China Ministry of Education with project number 2018A02013;
2019 basic capacity construction project of private colleges and universities
to extend to other kind of crops. At present, almost in Shandong Province; and Weifang Science and Technology Development
all crops may be affected by diseases and pests, and Programme with project numbers 2019GX071, 2019GX081 and 2019GX082;
the loss of yield is serious. It is of great significance Special project of Ideological and political education of Weifang University of
science and technology (W19SZ70Z01).
to identify the diseases and pests of each crop intel-
ligently. Availability of data and materials
2) Increase the division of diseases and pests severity, For relevant data and codes, please contact the corresponding author of this
manuscript.
study the early warning of diseases and pests. The
process of diseases and pests occurrence is related to
Declarations
the parasitic process of bacteria, and it is a process
from local leaves to the whole leaves. Although there Competing interests
are obvious differences in different diseases and pests, The authors declare no competing interests.
using convolutional neural network. In: 2017 seventh international con- 28. Arsenovic M, Karanovic M, Sladojevic S, Anderla A, Stefanović D. Solving
ference on image processing theory, tools and applications (IPTA); 2017. current limitations of deep learning based approaches for plant disease
9. Si L, Xiong X, Wang Z, Tan C. A deep convolutional neural network model detection. Symmetry. 2019;11:21.
for intelligent discrimination between coal and rocks in coal mining face. 29. Jiang P, Chen Y, Liu B, He D, Liang C. Real-time detection of apple leaf dis-
Math Probl Eng. 2020;2020(2):1–12. eases using deep learning approach based on improved convolutional
10. Tian Y, Yang G, Wang Z, Wang H, Li E, Liang Z. Apple detection during neural networks. IEEE Access. 2019;1:1.
different growth stages in orchards using the improved yolo-v3 model. 30. Zheng YY, Kong JL, Jin XB, et al. CropDeep: the crop vision dataset for
Comput Electron Agric. 2019;157:417–26. deep-learning-based classification and detection in precision agriculture.
11. Liu G, Nouaze JC, Touko PL, Kim JH. Yolo-tomato: a robust algorithm for Sensors. 2019;19(5):1058.
tomato detection based on yolov3. Sensors. 2020;20(7):2145. 31. Selvaraj MG, Vergara A, Ruiz H, et al. AI-powered banana diseases and
12. Manish T, Sharma NC, Paul F, Jauan B, Perumal V, Sahi SV. Nanotitania pest detection. Plant Methods. 2019;15:92. https://doi.org/10.1186/
exposure causes alterations in physiological, nutritional and stress s13007-019-0475-z.
responses in tomato (solanum lycopersicum). Front Plant Sci. 2017;8:633. 32. Fuentes A, Yoon S, Kim SC, Park DS. A robust deep-learning-based detec-
13. Mariko T, Hiroshi E. How and why does tomato accumulate a large tor for real-time tomato plant diseases and pests recognition. Sensors.
amount of GABA in the fruit? Front Plant Sci. 2015;6:612. 2022;2017:17.
14. Xu Z, Shou W, Huang K, Zhou S, Li G, Tang G, et al. The current 33. Fuentes AF, Yoon S, Lee J, Park DS. High-performance deep neural
situation and trend of tomato cultivation in china. Acta Physiol Plant. network-based tomato plant diseases and pests diagnosis system with
2000;22(3):379–82. refinement filter bank. Front Plant Sci. 2018;9:1162.
15. Fuentes A, Yoon S, Youngki H, Lee Y, Park DS. Characteristics of tomato 34. Fuentes AF, Yoon S, Park DS. Deep learning-based phenotyping system
plant diseases—a study for tomato plant disease identification. Proc Int with glocal description of plant anomalies and symptoms. Front Plant Sci.
Symp Inf Technol Converg. 2016;1:226–31. 2019;10:1321.
16. Liu J. Research on the development status and construction path of 35. Zhang Y, Song C, Zhang D. Deep learning-based object detection
Shouguang agricultural pest monitoring and early warning informatiza- improvement for tomato disease. IEEE Access. 2020;8:56607–14. https://
tion. Qinghai Agric Technol Promot. 2020;2020(02):37–40. doi.org/10.1109/ACCESS.2020.2982456.
17. Weaver RD, Evans DJ, Luloff AE. Pesticide use in tomato production: con- 36. Barbedo JGA. A review on the main challenges in automatic plant
sumer concerns and willingness-to-pay. Agribusiness. 2008;8(2):131–42. disease identification based on visible range images. Biosyst Eng.
18. Arias LA, Bojacá CR, Ahumada DA, Schrevens E. Monitoring of pesti- 2016;144:52–60.
cide residues in tomato marketed in bogota, colombia. Food Control. 37. Liu J, Wang X. Early recognition of tomato gray leaf spot disease based on
2014;35(1):213–7. MobileNetv2-YOLOv3 model. Plant Methods. 2020;16:83. https://doi.org/
19. Martinelli F, Scalenghe R, Davino S, Panno S, Scuderi G, Ruisi P, et al. 10.1186/s13007-020-00624-2.
Advanced methods of plant disease detection. A review. Agron Sustain 38. Chen X, Zhou G, Chen A, Yi J, Hu Y. Identification of tomato leaf diseases
Dev. 2015;35(1):1–25. based on combination of abck-bwtr and b-arnet. Comput Electron Agric.
20. Laterrot H. Disease resistance in tomato: practical situation. Acta Physiol 2020;178:105730.
Plant. 2000;22(3):328–31. 39. Pattnaik G, Shrivastava VK, Parvathi K. Transfer learning-based framework
21. Hu Q, Wang P, Shen C, et al. Pushing the limits of deep cnns for pedes- for classification of pest in tomato plants. Appl Artif Intell. 2020;34:1–13.
trian detection. IEEE Trans Circuits Syst Video Technol. 2018;28(6):1358–68. 40. Yu F, Koltun V, Funkhouser T. Dilated residual networks. In: 2017 IEEE con-
22. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate ference on computer vision and pattern recognition (CVPR), July 21–26,
object detection and semantic segmentation. In: IEEE conference on 2017, Honolulu, HI. New York: IEEE, 2017. p. 472–80.
computer vision and pattern recognition. Columbus: IEEE; 2014. p. 41. Bodla N, Singh B, Chellappa R, et al. Soft-NMS: improving object detec-
580–587. tion with one line of code. In: 2017 IEEE international conference on
23. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object computer vision (ICCV),October 22–29, 2017, Venice, Italy. New York: IEEE,
detection with region proposal networks. IEEE Trans Pattern Anal Mach 2017. p. 5561–9.
Intell. 2016;39:1137–49. 42. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition.
24. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC. SSD: single In: IEEE conference on computer vision and pattern recognition; 2016. p.
shot MultiBox detector. In: Proceedings of the European conference on 770–8.
computer vision—ECCV, Amsterdam, The Netherlands, 8–16 October; 43. Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, Inception-ResNet and
2016. p. 21–37. the impact of residual connections on learning. In: National conference
25. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, on artificial intelligence; 2016. p. 4278–84.
real-time object detection. In: Proceedings of the IEEE conference on
computer vision and pattern recognition, Las Vegas, NV, USA, 26 June–1
July 2016. p. 779–788. Publisher’s Note
26. Redmon J, Farhadi A. YOLO9000: better, faster, stronger; 2016 Springer Nature remains neutral with regard to jurisdictional claims in pub-
27. Redmon J, Farhadi A. YOLOv3: An incremental improvement. arXiv 2018, lished maps and institutional affiliations.
http://arxiv.org/abs/1804.02767 [cs], p. 1–6.