Wspanialy 2020
Wspanialy 2020
Keywords: The management of plant disease is a significant economic and environmental factor in the production of
Greenhouse tomato greenhouse tomato plants. Human expertise for assessing the presence and extent of disease is important in
Severity estimation creating and implementing management plans, but it is difficult and expensive to acquire. In this paper, we
Disease detection present a new computer vision system to automatically recognize several diseases, detect previously unseen
Machine learning
disease and to estimate per-leaf severity. Training and testing of models used several modified versions of the
nine types of tomato disease of the PlantVillage tomato dataset and showed how different leaf properties impact
disease detection.
1. Introduction which cataloged thousands of plant leaf disease images from twelve
crop species, including nine tomato diseases. Previous deep learning
Tomato disease management is a challenging process, requiring models trained on the PlantVillage dataset achieved high degrees of
continual attention throughout the crop cycle and accounts for a sig accuracy when recognizing previously seen symptoms of a particular
nificant fraction of total production costs (Peet and Welles, 2005). disease (Mohanty et al., 2016; Brahimi et al., 2018). Studies using a
Earlier detection can help reduce the cost of treatment, lower the en reduced number of PlantVillage images and other smaller datasets also
vironmental impact of chemical inputs, and mitigate risks of yield loss. showed similarly high accuracy rates (Barbedo, 2018; Fuentes et al.,
Current disease detection techniques are limited by the time required 2017; Alfarisy et al., 2018). However, for datasets with classes of si
for expert laborers to manually locate and assess disease, which is milar visual appearance, such as those of different tomato leaf disease,
complicated by the volume of plants found in commercial greenhouses smaller datasets are expected to result in lower-performing models.
and the small size of disease symptoms at their earliest stages. The Since the performance of models trained on progressively smaller da
expense and time required typically limits disease scouting to an in tasets remains high, this may suggest that their images contained
frequent schedule or sparse sampling, which can miss early localized spuriously correlated features, extraneous to disease. In this paper, we
symptoms and have a significant impact on the severity of an outbreak. propose a new method for detecting diseases in a tomato greenhouse.
Investigations into automated detection methods have included The proposed method is focused on recognizing generic features that
molecular analysis, spectroscopy, and analysis of volatile organic are associated with diseases regardless of disease type. This enables
compounds but are expensive and impractical to apply at commercial detection of new instances that might not have been seen before.
operating scales (Martinelli et al., 2015). Studies using visible features Quantifying the level of disease is another important challenge in
imaged with conventional RGB cameras have shown the ability for integrated disease management. Since it is often impossible to com
machine learning systems to recognize the presence of known plant pletely eradicate disease once an epidemic begins, treatment to de
disease using deep convolutional neural network models (Mohanty crease its spread can mitigate its impact on yield. Without a quantifi
et al., 2016; Alfarisy et al., 2018). Deep learning models typically re able measure, it is difficult to gauge the effectiveness of management
quire thousands of data points to accurately generalize predictions, practices or to predict and prepare for future epidemics. Measuring
while only small plant disease datasets were publicly available until the disease incidence or severity are two common methods for quantifying
introduction of the PlantVillage dataset (Hughes and Salathe, 2015) its degree. Incidence, the simpler of the two methods, is a measurement
⁎
Corresponding author.
E-mail addresses: [email protected] (P. Wspanialy), [email protected] (M. Moussa).
https://doi.org/10.1016/j.compag.2020.105701
Received 31 January 2020; Received in revised form 21 April 2020; Accepted 5 August 2020
0168-1699/ © 2020 Published by Elsevier B.V.
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
The two methods presented can help reduce associated labor costs
while increasing the accuracy and precision of disease scouting.
Additionally, it can enable larger automated scouting operations, which
can help reduce sample size bias.
2. Dataset
The primary source of data used in this paper were images from the
unmodified color version of tomato leaves in the PlantVillage dataset
(Hughes and Salathe, 2015). The dataset consists of images of single
leaves removed from their plants with naturally occurring or inoculated
Fig. 1. Illustrated difference between disease incidence and severity. The top disease. Images were taken with natural lighting against a grey back
example shows three plants, two of which are exhibiting low levels of disease. ground near the sampling site. The subset of tomato images contains
The bottom example shows three plants, again with two exhibiting disease but
16415 diseased leaf images exhibiting nine different conditions and
at higher degrees. The severity measurement reflects this change, while in
1590 healthy leaves, with their conditions classified by experienced
cidence does not.
observers. The original PlantVillage dataset was captured using a
twenty-megapixel camera from four to seven different orientations to
of disease frequency, usually at the level of individual plants. Severity, compensate for directional lighting variation. Each image was later
in contrast, is a more precise value, measuring the ratio of diseased area rotated to have the leaf tips pointed upwards. The version of the dataset
based on leaf coverage. Fig. 1 illustrates how the two measurements 1
used in this study had been scaled down to 256 × 256px and rotations
differ. Both techniques can be affected by errors in sampling, but se of the same leaf were removed. Example images from the ten classes of
verity introduces additional errors when estimating leaf coverage (Bock tomato leaves are shown in Fig. 3.
et al., 2010). Severity is often compared against standard area diagrams The original study of this dataset (Mohanty et al., 2016) in
showing prototypical examples of symptomatic leaves (Barratt and vestigated the impact of color and background information on the
Horsfall, 1945; Sconyers et al., 2006). performance of classification models, showing that color was important
The surveying aids traditionally used by scouts to assess severity, for in improving accuracy, but that backgrounds did not have an effect.
example Fig. 2, do not produce results precise enough to discriminate When removing backgrounds, the authors simultaneously applied color
between significant differences in diseased area (Parker et al., 1995). correction to the leaf area which may have introduced bias if adjust
Improvements to comparative diagrams were developed as a digital ments were not uniform across every class. Fig. 4 shows an example of
assistant but required users to manually select the color ranges asso the differences between the original color image and the segmented
ciated with disease in images for the symptomatic areas to be calculated version used in Mohanty et al. (2016).
(Pethybridge and Nelson, 2015). While interactive labeling is suitable
for smaller samples, it still requires significant manual input and is
3. Disease detection
impractical when applied to large samples of commercially sized
greenhouses. Another approach to estimating disease progress was in
In order to explore the contribution image properties to disease
vestigated using a machine learning model to automatically classify
detection and to investigate detection methods that are agnostic to the
leaves into discreet early, middle, and end-stage classification of disease
type of disease, we tested several variations of the PlantVillage dataset.
(Wang et al., 2017). This technique provides more information than just
The original PlantVillage dataset was used to generate variations to
binary detection, but the resolution is not fine-grained enough for
investigate the contribution of leaf surface features, leaf shape, and
tracking small changes in disease progression. Current methods are
background content to the accuracy of disease classification. In order to
reliant on user input or provide a course account of severity, which
investigate the ability for a system to recognize previously unseen
cannot be used for monitoring large areas. In this paper, we propose a
disease, the dataset was split into nine different disease-healthy binary
new method of estimating severity by detecting and measuring the
subsets, each one leaving out one of the nine diseases. The corre
proportion of symptomatic leaf area automatically.
sponding test-sets for each leave-one-out dataset contained only sam
ples from the left-out disease. Each dataset variant was nominally split
1.1. Paper contribution
into 60% training, 5% validation, and 25% test subsets. The test subsets
were split evenly between classes, with excess samples moved to the
There are two primary contributions in this paper.
training dataset. A description of the dataset variations is shown in
Table 1, and example images are shown in Fig. 5.
1. We propose a disease detection model that learns to detect instances
In order to avoid overfitting, which can impact performance due to
of disease based on generic features. The algorithm utilizes the Res-
poor generalization beyond training samples, we will use several
Net deep learning architecture. Performance shows that it can detect
methods for data augmentation. Data augmentation is a technique used
new instances of diseases not seen before.
to artificially increase variations in images to reduce model overfitting.
2. We propose a new model for estimating disease severity. The model
Datasets were augmented during the training stage of each model to
utilizes a modified U-Net deep learning architecture to estimate the
reduce the effect of rotation, translation, size, and illumination. The
severity of disease based on a hybrid deep learning model.
following modifications were randomly applied to images: vertical and
horizontal flipping, scaling ± 20%, translation ± 20%, rotation ± 45
degrees, shear ± 20 degrees, modification of hue and saturation
by ± 20%, and a gaussian blur with a standard deviation between 0 and
3. Since augmentations were applied dynamically during training, the
number of dataset samples increased proportionally to training length.
This results in an effectively infinitely sized dataset, but with progres
sively lower variation between samples as it grows.
2
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
Fig. 3. Examples of images from the ten classes found in the PlantVillage tomato dataset.
Table 1
PlantVillage dataset variations used in this study.
Dataset Variation Description
Color The original 256 × 256 color images used in Mohanty et al. (2016). Nine classes of disease and one healthy class.
Color-masked The color version masked using the segmentations from Mohanty et al. (2016).
No color correction performed.
Silhouette The segmentation masks from Mohanty et al. (2016).
Background-only Segmentation masks from Mohanty et al. (2016), heavily dilated and blurred, thresholded, inverted, and applied to color images.
Color-masked-binary The color-masked dataset with only two classes, diseased and healthy.
Leave-one-out (9 variations) The color-masked-binary dataset but with one of the nine diseases left out. Nine variations, each with a different disease left out.
tection. The model’s output layer was resized from the 1000 classes
found in ImageNet to match the number of classes in each PlantVillage
dataset variation, and its weights were randomly assigned. For ex
ample, when training the model to differentiate between each of the
nine different diseases and healthy leaves, the output size was 10
(9 + 1). When trained to distinguish between the binary option of
Fig. 5. Example images from dataset variations. healthy or diseased, the output size was 2. Models were trained using a
Fig. 6. Data-flow of ResNet model used to classify between one of the color-masked binary datasets. The structure is similar for models with mode than two classes,
with only an expansion of the final output layer.
3
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
Fig. 7. Confusion matrix for the color dataset. Fig. 9. Confusion matrix for the background-only dataset.
Fig. 8. Confusion matrix for the color-masked dataset. Fig. 10. Confusion matrix for the silhouette dataset.
4
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
Table 3
Leave-one-out dataset classification accuracies.
Left-out-disease dataset variation Classification accuracy
Fig. 13. Confusion matrices for disease recognition on the Google “tomato leaf”
dataset 13.
5
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
Fig. 14. Annotation example of a leaf afflicted with late blight disease.
Table 4 annotated, 200 from each of the nine class. Each disease class dataset
Mean and standard deviation ( ) of disease severity for each tomato disease was spit into subsets for training, validation, and testing, at proportions
dataset. of 70%, 10%, and 20%, respectively. Image augmentation was ex
Disease Mean Severity ( ) panded to create background invariance by randomly replacing back
grounds with images from the Describable Textures Dataset
Early Blight (Alternaria solani) 0.118 (0.079) (Describable Textures Dataset). Annotations were made using a tool 3
Late Blight (Phytophthora infestans) 0.395 (0.229)
developed for annotating plant parts, producing results compatible with
Leaf Mold (Passalora fulva) 0.157 (0.121)
Septoria Leaf Spot (Septoria lycopersici) 0.119 (0.072)
the common objects in context (COCO) (Lin et al., 2014) data format.
Target Spot (Corynespora cassiicola) 0.038 (0.037) Image pixels were annotated into one of three classes; health, disease,
Bacterial Spot (Xanthomonas campestris pv. vesicatoria) 0.167 (0.141) and background. Table 4 shows the mean severity of each of the nine
Mosaic Virus 0.233 (0.109) conditions. A separate set of datasets were created for each disease class
Yellow Leaf Curl Virus 0.258 (0.111)
to evaluate the effects of their symptom expression on model perfor
Spider Mites (Tetranychus urticae) 0.315 (0.153)
mance independently.
shown in Table 3. Their corresponding confusion matrices are shown in A U-Net architecture, Fig. 15, originally used to segment a small
Figs. 11 and 12. dataset of microscopic images of cells was used to segment diseased leaf
In order to determine how well the models generalized, an external area. U-Net’s characteristic “U” shape architecture is built with a tra
dataset was created by collecting the first 100 images found using a ditional convolutional neural network in its downsampling path and is
Google image search for “tomato leaf”. Leaves were manually seg followed by a mirrored upsampling step. At the end of upsampling, the
mented (see Fig. 14) and evaluated for health status using the model original resolution of the input is restored, assigning class labels it each
trained on the color-masked dataset. The Google dataset had an 80% pixel of the image. Ronneberger et al. (2015) This fine-grained class
bias towards diseased leaves, which mirrored the resulting confusion prediction is known as semantic segmentation. The model’s original
matrix show in Fig. 13, indicating poor generalization. downsampling step was replaced with the VGG16 (Simonyan and
Zisserman, 2014) architecture, functioning as an image feature ex
4. Severity estimation tractor. Weights in the VGG16 portion were pre-trained on the Im
ageNet dataset and weights in the upsampling path were randomly set
The goal of a disease severity estimation is to precisely and accu with a Xavier initializer (Glorot and Bengio, 2014). Models for each
rately measure the proportion of leaf area showing disease symptoms, disease dataset were trained for a total of 30 epochs. The first ten
or epochs only adjusted the weights in the upsampling path, since gradient
updates from the random weights could disrupt the pre-trained Im
diseased area
severity = ageNet weights of the downsampled path. The final 20 epochs were
total leaf area (1) trained using the complete set of weights. Training loss was weighted in
The PlantVillage dataset contains several thousand samples classi
fied by disease but without annotation of diseased area. The process of
segmenting images is labor-intensive, and so only 1800 images were 3
https://github.com/uoguelph-ri/coco-annotator.
6
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
Fig. 16. Example outputs of the disease severity estimation model for early and late blight, leaf mold, and Septoria leaf spot. The input images are in the first column,
followed by ground truth annotations, and model predictions. Numbers in parentheses are the ground truth and predicted severity estimations.
the same manner as that for disease detection to compensate for the 4.1. Results
unequal amounts of pixels belonging to the diseased, healthy, and
background classes. Severity semantic segmentation models were trained on each of the
The performance of disease segmentation was calculated using the individual disease’s training datasets and tested on their corresponding
Jaccard index, also known as the intersection over union (IoU) mea test sets. Figs. 16 and 17 show example predictions for each of the nine
sure. The index is defined as the ratio between the intersection of the diseases. Disease severity estimation performance was measured using
model prediction and the truth, and their union and used to measure the absolute difference between ground truth and predicted values of
their similarity, Eq. (2). The predicted area is generated by the computer severity, i.e. severity error. This measure only considers how accurate
model after being trained, while the truth area is labeled manually by a the severity estimation was and does not consider if the specific areas
human. were marked correctly, unlike the Jaccard index. Mean performances of
each model measured using the Jaccard index, and severity error are
predicted truth shown Table 5.
J (predicted, ground truth) =
predicted truth (2)
5. Discussion
7
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
8
P. Wspanialy and M. Moussa Computers and Electronics in Agriculture 178 (2020) 105701
ability to recognize diseases not trained on before. imaging. Crit. Rev. Plant Sci. 29 (2), 59–107. https://doi.org/10.1080/
Additionally, this paper introduced a new annotated proportional 07352681003617285.
Brahimi, M., Arsenovic, M., Laraba, S., Sladojevic, S., Boukhalfa, K., Moussaoui, A., 2018.
disease severity dataset and associated estimation model. Fungal and Deep learning for plant diseases: detection and saliency map visualisation. In: Zhou,
bacterial diseases are most suitable for disease severity estimation using J., Chen, F. (Eds.), Human and Machine Learning: Visible, Explainable, Trustworthy
proportional area measures, while ordinal categories are more suitable and Transparent, Human-Computer Interaction Series, Springer International
Publishing, Cham, 2018, pp. 93–117. doi:10.1007/978-3-319-90403-0_6.
for systemic diseases like those caused by viruses and insects. The re Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L., 2009. ImageNet: A large-scale
sults of this study can be applied practically by integrating the models hierarchical image database. In: 2009 IEEE Conference on Computer Vision and
into an automated surveying system, reducing costs and measurement Pattern Recognition, pp. 248–255. https://doi.org/10.1109/CVPR.2009.5206848.
Describable Textures Dataset, https://www.robots.ox.ac.uk/ vgg/data/dtd/.
bias, and increasing precision and greenhouse coverage. Fuentes, A., Yoon, S., Kim, S.C., Park, D.S., 2017. A robust deep-learning-based detector
for real-time tomato plant diseases and pests recognition. Sensors (Basel,
CRediT authorship contribution statement Switzerland) 17(9). doi:10.3390/s17092022.
Glorot, X., Bengio, Y., 2014. Understanding the difficulty of training deep feedforward
neural networks 8.
Patrick Wspanialy: Conceptualization, Methodology, Software, He, K., Zhang, X., Ren, S., Sun, J., 2015. Deep Residual Learning for Image Recognition,
Validation, Formal analysis, Investigation, Resources, Data curation, arXiv:1512.03385 [cs]arXiv:1512.03385.
Writing - original draft, Writing - review & editing, Visualization. Hughes, D.P., Salathe, M., 2015. An open access repository of images on plant health to
enable the development of mobile disease diagnostics, arXiv:1511.08060 [cs]
Medhat Moussa: Conceptualization, Resources, Writing - review & arXiv:1511.08060.
editing, Supervision, Project administration, Funding acquisition. Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P.,
Ramanan, D., Zitnick, C.L., Dollár, P., 2014. Microsoft COCO: Common Objects in
Context, arXiv:1405.0312 [cs]arXiv:1405.0312.
Declaration of Competing Interest Martinelli, F., Scalenghe, R., Davino, S., Panno, S., Scuderi, G., Ruisi, P., Villa, P.,
Stroppiana, D., Boschetti, M., Goulart, L.R., Davis, C.E., Dandekar, A.M., 2015.
Advanced methods of plant disease detection. A review. Agronomy Sustain. Develop.
The authors declare that they have no known competing financial 35 (1), 1–25. https://doi.org/10.1007/s13593-014-0246-1.
interests or personal relationships that could have appeared to influ Mohanty, S.P., Hughes, D.P., Salathé, M., 2016. Using deep learning for image-based
ence the work reported in this paper. plant disease detection. Front. Plant Sci. 7. doi:10.3389/fpls.2016.01419.
Nutter, F.W., Esker, P.D., 2006. The role of psychophysics in phytopathology: the weber-
fechner law revisited. Eur. J. Plant Pathol. 114 (2), 199–213. https://doi.org/10.
Appendix A. Supplementary material 1007/s10658-005-4732-9.
Parker, S.R., Shaw, M.W., Royle, D.J., 1995. The reliability of visual estimates of disease
severity on cereal leaves. Plant. Pathol. 44 (5), 856–864. https://doi.org/10.1111/j.
Supplementary data associated with this article can be found, in the
1365-3059.1995.tb02745.x.
online version, at https://doi.org/10.1016/j.compag.2020.105701. Peet, M.M., Welles, G., 2005. Greenhouse Tomato Production 48.
Pethybridge, S.J., Nelson, S.C., 2015. Leaf Doctor: a new portable application for quan
References tifying plant disease severity. Plant Dis. 99 (10), 1310–1316. https://doi.org/10.
1094/PDIS-03-15-0319-RE.
Ronneberger, O., Fischer, P., Brox, T., 2015. U-Net: Convolutional Networks for
Alfarisy, A.A., Chen, Q., Guo, M., 2018. Deep Learning based classification for paddy pests Biomedical Image Segmentation, arXiv:1505.04597 [cs]arXiv:1505.04597.
& diseases recognition. In: Proceedings of 2018 International Conference on Sconyers, L.E., Kemerait, R.C., Brock, J., Phillips, D.V., Jost, P.H., Sikora, E.J., Gutierrez-
Mathematics and Artificial Intelligence, ICMAI ’18, ACM, New York, NY, USA, 2018, Estrada, A., Mueller, J.D., Marois, J.J., Wright, D.L., et al., 2006. Asian soybean rust
pp. 21–25. doi:10.1145/3208788.3208795. development in 2005: A perspective from the Southeastern United States. APSnet
Barbedo, J.G.A., 2018. Impact of dataset size and variety on the effectiveness of deep Features.
learning and transfer learning for plant disease classification. Comput. Electron. Simonyan, K., Zisserman, A., 2014. Very Deep Convolutional Networks for Large-Scale
Agric. 153 (Complete), 46–53. https://doi.org/10.1016/j.compag.2018.08.013. Image Recognition, arXiv:1409.1556 [cs]arXiv:1409.1556.
Barratt, R., Horsfall, J., 1945. An improved grading system for measuring plant disease. Wang, G., Sun, Y., Wang, J., 2017. Automatic Image-Based Plant Disease Severity
Phytopathology 35, 655. Estimation Using Deep Learning, https://www.hindawi.com/journals/cin/2017/
Bock, C.H., Poole, G.H., Parker, P.E., Gottwald, T.R., 2010. Plant disease severity esti 2917536/abs/ (2017). doi:10.1155/2017/2917536.
mated visually, by digital photography and image analysis, and by hyperspectral