Plant Disease Detection Using Machine Learning
Plant Disease Detection Using Machine Learning
MACHINE LEARNING
in
By
I hereby declare that the work presented in this report entitled “ Plant Disease
Detection using Machine Learning” in partial fulfillment of the requirements
for the award of the degree of Bachelor of Technology in Computer Science
and Engineering/Information Technology submitted in the department of
(Student Signature)
Vaibhav Sharma(191545)
This is to certify that the above statement made by the candidate is true to the
best of my knowledge.
(Supervisor Signature)
Dr. Ruchi Verma
Associate Professor (SG)
Department of CSE & IT
(i)
PLAGIARISM REPORT
(ii)
ACKNOWLEDGEMENT
At the onset, I express my heartfelt thanks and gratefulness to God for his pure
and divine blessing that makes it possible for us to complete the project work
successfully within the right time. I am really humbled to do this endeavor
project under my respected professor and I wish my profound indebtedness to
Supervisor Dr. Ruchi Verma, Assistant Professor, Department of CSE & IT,
Jaypee University of Information Technology (JUIT), Waknaghat. She has
deep Knowledge of this project related stuff & her keen interest in guiding us
in the field of "Machine Learning" to carry out this project. Her endless
patience, scholarly guidance, perennial encouragement, constant and energetic
supervision, and valuable advice pertaining to many pre-published drafts have
made it possible for me to complete this project.
I would like to express my deep gratitude from the bottom of my heart to Dr.
Ruchi Verma, Department of CSE, for her generous help to finish my project.
I would also acknowledge each one of those individuals who have helped me
directly or indirectly in making this project a possibility. In this juncture, I
would like to thank the staff fraternity, individuals, both educating and
non-instructing, which have developed their convenient help and facilitated
my undertaking.
Vaibhav Sharma(191545)
(iii)
TABLE OF CONTENT
(iv)
LIST OF FIGURES
(v)
ABSTRACT
(vi)
Chapter-1 INTRODUCTION
1.1 Introduction
Plant disease detection is a cutting-edge and enlightening system that helps
users learn about diseases, trainings, and other fascinating events happening
nearby. This organisation helps the local population stay informed about
activities in and around their town, region, or locale. This approach requires
both machine learning and image processing in order to function. The user is
only allowed to add diseases related to his town, albeit they can be added for
any town. When a user adds an unsuitable, fictitious, or misused ailment, the
administrator will display it and take appropriate action. Only 500 words are
available for the user to lecture on diseases. The system gives a swipe to
advance to the next or earlier disease with transition features, and the
appearance and texture of the disease evaluation are exciting and
outstanding.Farmers in rural areas might believe it is difficult to distinguish the
diseases that might affect their harvests. They cannot easily visit the
agricultural office to find out what the infection might be. Our main goal is to
identify the disease that is introduced in a plant by observing its shape using
image processing and machine learning.
Pests and diseases harm crops or plant parts, reducing food production
and escalating food poverty. Toxic infections, inadequate disease management,
and significant climatic change are some of the major contributors to
decreasing food production. Numerous innovative technologies have emerged
in order to decrease post-harvest processing, improve agricultural
sustainability, and boost production. Numerous laboratory-based techniques,
such as polymerase chain reaction, gas chromatography, mass spectrometry,
thermography, and hyperspectral approaches, have been employed to identify
illnesses. These techniques are time-consuming and inefficient from a financial
standpoint. Recently, server-based and mobile-based methods have been used
to identify diseases. The high resolution camera, among other
1
things, makes it possible to automatically identify diseases.Using modern
techniques like machine learning and deep learning algorithms has increased
the accuracy of the outcomes. Numerous research have been carried out
utilising traditional machine learning techniques, including random forests,
artificial neural networks, support vector machines (SVM), fuzzy logic,
K-means method, and convolution neural networks, among others, for the
detection and diagnosis of plant illnesses. In general, random forests are a
learning method that develops a forest of decision trees during the training
phase and applies it to problems like classification, regression, and others.
1. Hu moments
2. Haralick texture
3. Color Histogram Hu moments is basically used to extract the shape of the
leaves.
To obtain the colour and texture of the leaves, haralick texture is utilised. A
histogram is a graph that shows how the colours in an image are distributed.
Farmers in rural areas might believe it is difficult to distinguish the diseases
that might affect their harvests. They cannot easily visit the agricultural office
to find out what the infection might be. Our main goal is to identify the disease
that is introduced in a plant by observing its morphology, processing its
images, and using machine learning. Pests and diseases cause crops or parts of
plants to be destroyed, which lowers food output and increases food insecurity.
Various less developed nations also have a lower level of expertise on the
prevention and control of illnesses and pests.
In the recent past, disease identification has been done using server-based and
mobile-based approaches. Automatic disease recognition is made possible by a
2
number of elements, including the high resolution camera, high performance
processing, and numerous built-in accessories. A portable system for detecting
plant diseases operates in an automated environment built with the Java
language. The app will operate quickly and smoothly. Because Google
Material Design is used, apps look more elegant and beautiful and provide a
positive user experience. It manages news, different categories, notifications,
and many other things whenever we want when admin and user are combined.
Good design and good programming are priorities. Using this app
allows us to save time and money while also allowing us to construct our own
unique or distinct types of apps based on our needs. Plant disease detection is a
cutting-edge and enlightening system that helps users learn about diseases,
trainings, and other fascinating events happening nearby. The local community
is assisted by this organization in staying informed about events taking place in
and around their town, area, or location. In order for this method to work, both
machine learning and image processing are necessary. The user is only allowed
to view diseases related to his town, though they can be added for any town.
When a user adds an unsuitable, fictitious, or misused ailment, the
administrator will display it and take appropriate action. Android Studio is the
front end, and SQL Server is the back end. To use this app, the user must
register with the system and may also update his information.
To create a trustworthy and accurate system that can automatically detect and
diagnose plant illnesses in a quick and timely manner is the problem
statement for plant disease detection using machine learning. This system
should be able
to generalise to new locations and plant species while processing massive
amounts of data rapidly and accurately.
3
The present methods for identifying plant diseases, such visual inspection or
laboratory analysis, take time, are subjective, and frequently need specialised
knowledge. Furthermore, these techniques might not be able to identify
diseases in the earliest stages, when treatments are most efficient.
The potential for machine learning-based methods to get over these
restrictions and offer an automated, objective solution for plant disease
diagnosis. However, the creation of such systems necessitates the
accessibility of sizable and varied datasets, the choice of suitable machine
learning algorithms, and the hyperparameter optimisation for the particular
issue at hand.
1.3 Objectives
1) Early Detection: Preventing the spread of plant diseases requires early
detection. By assisting in the early detection of plant illnesses, machine
learning algorithms enable farmers to respond quickly to limit additional
harm.
4
1.4 Methodology
The following models have been used and trained for the detection of Plant Diseases
and the datasets have been taken from Kaggle.com, then the models with the highest
accuracy are selected and they undergo an ensemble model.
5
trees, neural networks, support vector machines, or any other machine learning
algorithm. Each base model in the ensemble is trained
independently on a subset of the training data or using a different algorithm variation
to introduce diversity.
There are several common techniques for combining the predictions of the base
models in an ensemble. The two most popular methods are:
Ensemble learning offers several advantages over using a single model. It helps in
reducing overfitting by introducing model diversity and capturing different aspects of
the data. Ensemble models tend to have better generalization capabilities and can
handle complex patterns in the data. They are also more robust to noise and outliers
in the training data.
6
Overall, ensemble learning is a powerful technique that can significantly improve the
performance and accuracy of machine learning models by leveraging the collective
knowledge of multiple models.
Ensemble methods can help reduce the bias introduced by individual models. If a
base model is biased in some way, combining it with other models that have different
biases can lead to a more balanced and unbiased overall prediction.
The dataset used is the Banana Leaf Dataset from Kaggle.com and it has 2000 images
across 2 classes of healthy and unhealthy leaves with 256x256 pixels of each leaf.
\
7
1.4.1 Data Augmentation
In machine learning, data augmentation is utilized to expand the size and
variety of a training dataset by generating synthetic data from existing data.
This method is particularly advantageous for detecting plant diseases, as there
may be a restricted number of images accessible for training.
There are several ways to perform data augmentation in the detection of plant
diseases using machine learning, including:
By applying these techniques, a larger and more diverse training dataset can be
created, which can improve the accuracy of the machine learning model in
detecting plant diseases. It's worth noting, however, that data augmentation
alone cannot guarantee good performance; other factors such as the choice of
algorithm, hyperparameters, and the quality of the original dataset also play
important roles.
8
Fig 1.1 Infected leaf
9
1.4.2 Data Acquisition
10
1.5 Organization
11
Chapter-2 LITERATURE SURVEY
12
4) Godliver Owomugisha Et al. [4] Automated Vision-Based Diagnosis
of Banana Bacterial Wilt Disease and Black Sigatoka Disease “Color
histograms are extracted and transformed from RGB to HSV. Peak
components are used to create max tree, five shape attributes are used
and area under the curve analysis is used for classification.They used
nearest neighbors, Decision tree, random forest, extremely randomized
tree, Naïve bayes and SV classifier. In seven classifiers extremely,
randomized trees yield a very high score, provide real time information
provide flexibility to the application.
5) Chunjiang Zhao Et al. [5] SVM-based Multiple Classifier System
for Recognition of Wheat Leaf Diseases,” Color features are
represented in RGB to HIS, by using GLCM, seven invariant moment
are taken as shape parameter. They used SVM classifier which has
MCS, used for detecting disease in wheat plant offline.
13
Chapter-3 SYSTEM DEVELOPMENT
3.1 Analysis/Design/Development/Algorithm
14
representing healthy leaves. Additionally, a class for background images was
added to improve classification accuracy.
To distinguish the leaves from the surrounding, a deep neural network
was trained on the dataset, including the background images taken from the
Stanford background dataset. In systems engineering and requirements
engineering, non-functional requirements refer to the criteria that can be used
to judge the operation of a system, rather than specific behaviors, in contrast to
functional requirements that define specific behavior or functions. The plan for
implementing functional requirements is detailed in the system design, while
the plan for implementing non-functional requirements is detailed in the
system architecture. Other terms for non-functional requirements include
"constraints", "quality attributes", "quality goals", "quality of service
requirements", and "non-behavioral requirements".
15
3.3 INPUT REQUIREMENTS
Techniques used:
Image processing: The user has to record into the system by giving his own information.
HOG: The registered user has to login into the app to add or see the disease, the user is
recorded once he logs in until he logs out holding his time to login every other time.
Machine learning: The user is given chance to add disease choosing a town or area to refer to
also add an image related to the news.
CNN: The user can see the disease based on the zone the user will select.
16
Artificial Intelligence : User can also view the list of disease added by him and
action taken by admin if any.
Several disease: If the user finds any disease is not real or unpleasant he can
say it to the admin, admin will take some act.
Random forest: Admin should login into the app to check disease or report
submitted by the user.
17
Fig. 3.1 Description of CNN
18
project, it is accessible to anyone who has Python installed on their
system.
19
the application's codebase. In the case of mobile devices, the AI/ML
functionality often takes up the entire screen and is responsible for
managing the user interface and interactions with the device's screen.
● SMV AND HOG: In Android, GUI elements are called Views, such
as a TextView displaying text or a Button that users can click on. View
Groups are containers for Views. A ViewGroup can contain a
collection of Views together, and Views and View Groups can be
nested within each other. Fragments and Activities can use XML files
to define their layout and content. The layout XML files define the
GUI elements included in an Activity or Fragment, as well as the
layout of those GUI elements (size, margins, padding, etc.).
20
of plant disease detection, these widgets are used to display and
visualize diseases as part of a detection request. In some cases, views
of plant diseases may also be referred to as widgets.
1) PYTHON
21
2) CLUSTERING
A cluster refers to a set of data objects that share similar
characteristics, while dissimilar data objects belong to different
clusters. Clustering refers to the process of organizing abstract objects
into similar groups. A cluster of data objects is treated as a single
group. During cluster analysis, data is divided into groups based on
their similarity and given labels. Unlike classification, clustering is
adaptive and can identify features that distinguish different groups. The
k-means algorithm will be used to cluster crime data in the proposed
system.
3) CLASSIFICATION
22
Fig 3.3 Classification versus Regression
4) DECISION TREE
This method plays a crucial part in machine learning since it is simpler
for people to comprehend. The decision tree starts with a root that is a
straightforward query. The questions on the root node of the decision
tree may lead to more questions because they are open to multiple
replies. As a result, the decision tree's nodes continue to grow. We're
finally allowed to decide on it definitively.
Advantages:
1) More efficient.
23
4) It reduces the time complexity of the system.
JDK
To create and run Java programmes and applets, JDK is necessary. Five
categories are used to group JDK tools:
Simple Tools Tools for Remote Method Invocation (RMI) Internationalisation
Tools for Security IDL Java Tools
JDK Starter Tools The Java Development Kit is built on these tools.
javac
The .java file is compiled using the Java programming language's compiler,
javac. It generates a classfile that may be used with a Java command to run it.
Java The java command can be used to execute a Java programme after a class
file has been created.The command prompt is used to run both.
24
java
It is the extension for text files that include Java source code. After it has been
coded and saved, the javac compiler is used to produce.class files. The Java
command can be used to launch the Java programme as the.classfiles are
prepared.
javadoc
appletviewer
The Eclipse Project offers a range of resources and can be downloaded from its
downloads page. Its sixteenth annual release provides several new features for
the Platform and Equinox, Java developers, and plug-in developers. Eclipse is
an IDE used for computer programming, particularly as a Java IDE, and
includes a customizable plug-in system within its base workspace.
25
Additionally, the K means clustering algorithm is a simple unsupervised
learning algorithm used for cluster analysis and is illustrated in a flowchart.
This method plays a crucial part in machine learning since it is simpler for
people to comprehend. The decision tree starts with a root that is a
straightforward query. The questions on the root node of the decision tree may
lead to more questions because they are open to multiple replies. As a result,
the decision tree's nodes continue to grow. We're finally allowed to decide on
it definitively.
Advantages:
1)More efficient.
26
Fig. 3.4 Decision Tree for Playing Tennis
They have remained the go-to approach for an algorithm with great
performance and little tuning since they were developed in the 1990s. Support
vector machines are a class of supervised learning models with corresponding
learning algorithms used in machine learning to analyse data used in
classification and regression analysis. Unsupervised learning is not possible
with unlabeled data. In a high-dimensional, infinite-dimensional space, it
creates a hyperlane and a group of hyperlanes that can be utilised for various
tasks, such as outlier detection.
27
Advantages
4)Versatile.
5) Support vector machine (SVM) evaluates the data, classifies it, and then
does the regression.
6) The two primary characteristics for diagnosing the condition are accuracy
and detection time. SVMs, or support vector machines, boost recognition
rates.
28
neurons during training, neural networks can learn to recognize patterns in
data and make predictions or classifications.
29
3.11 KEY ISSUES AND CHALLENGES
Detecting wheat leaf diseases is important for crop health and productivity, but
current machine learning methods face challenges. Digital images of wheat
leaves are captured to measure their shape, size, and texture through feature
extraction. Yellow rust disease caused by Puccinia striiformis Tritici produces
yellow spores on leaves during winter and spring, and new protection methods
like BTH aim to provide inherent disease resistance mechanisms. Seed-borne
diseases can reduce yield and quality of grains. Machine vision uses color and
geometric features for identification, but KNN classification is not suitable for
large applications due to distance calculations. Histogram equalization
enhances image contrast for better human perception and is used in wheat and
plant disease applications. Hyperspectral data is collected for healthy and
diseased wheat to build SVR models for disease index inversion and Random
Forest classification of wheat leaf diseases.
Although the use of machine learning for plant disease detection shows
promising results, there are still some key issues and challenges that need to be
addressed.
30
3) Variability in Environmental Conditions: Environmental factors such as
lighting, humidity, and temperature can have a significant impact on
plant health and appearance. Variability in these factors can make it
challenging to develop accurate machine learning models that can
detect diseases across different environments.
Model Development
• Analytical
• Computational
• Experimental
• Mathematical
31
Chapter 4- PERFORMANCE ANALYSIS
The application of different models for prediction has provided the results
in different forms such as a confusion matrix, a comparison graph, and a
bar graph to compare the Accuracy, Precision, F1- Score and Recall of
different models. These values have been figured out using the formulae
given below.
4.1 Formulae
32
5) A = (Tp + Tn)/ (Tp + Tn + Fp + Fn)
33
4.2 Result Analysis
The dataset is the Banana Leaves dataset, taken from Kaggle.com with
2000 different images, in two classes of Healthy and Unhealthy
leaves, and the following models were applied on the dataset, the
results are as follows.
1)VGG-16 Model
34
necessary. As a result, the model won't fit the new data set properly.On
The training data set (sample data), this model achieves great
accuracy;however, on the test data set, it does not. In other words, by
overfitting the training set, the model loses its capacity to generalize.
35
The confusion matrix representing the True Positive, True Negative,
False Positive and False Negative values across the two curves of
Predicted values and Actual values. The True Positive value in this
case is 5.0, meaning that of all the values in the dataset the predicted
value was true as well as the actual value. The True Negative value is
53.0 meaning that the model predicted the value to be negative and the
actual value was negative. False Positive value is 0.0 meaning that the
predicted no value was predicted to be false and came out as true. The
False Negative value is 4.0 meaning 4 of the values were predicted to
be true but the actual value was false.
2) DenseNet201
36
Fig 4.5 The results for the DenseNet Model with an accuracy of 92%
The accuracy of the DenseNet Model is 92% , with the Recall being
1.00, meaning that the False Negative value was 0, which means that
there was no value where the model predicted the value to be False
and the actual value was True.The Precision is 0.91 which means that
the False Positive was greater than 0 meaning that there was a False
value that was predicted to be True. The F1 Score is 0.95 which is the
harmonic representation of both Precision and Recall, being this high
means that the model is working well.
3) InceptionV3 Model
37
Fig 4.7 The epochs running for InceptionV3 Model
The InceptionV3 Model ran all the epochs without stopping early
meaning that there was enough variation in the dataset for the model
to run all the way through because the model stops early if there is not
enough variation to prevent overfitting.
38
The True Positive value for this model is 4.0 which means that there
were 4 cases that were predicted to be True and their actual value was
True, there are 53.0 cases of True Negative meaning they were
predicted to be False and the actual value was False.The False
Negative value is 1.0 meaning that one value was predicted to be false
but it turned out to be True.
The accuracy for the InceptionV3 Model is 90%, with the Precision
being 91% and the Recall being 98% which means neither the False
Positive was 0 nor the False Negative, therefore there were a few
cases where the model predicted the wrong value. The F1- Score is
95% which is the harmonic representation of both the Precision and
Recall, meaning that the model is giving a majority correct prediction.
39
Model Precision Recall F1-Score Accuracy
Table 1. Comparison of results between different models with the Banana Leaf Dataset
The above graph compares the Precision, F1- Score and the Recall of
all three models and the precision is the highest for the VGG-16
Model, the Recall is the highest for the VGG-16 Model, InceptionV3
Model has the highest F1-Score and overall DenseNet201 has the
highest accuracy.
Applying Ensemble Learning
We have used the two models with the highest accuracy and applied
ensemble learning to it, the models being DenseNet201 and VGG-16
with the Banana Leaf Dataset.
41
Macro average calculates the Precision, F1- Score and Recall for each
class and then finds the average by combining it with the other class
and calculating the average of both. This helps provide a definitive
view of both the classes, giving equal weight to each class. Weighted
average Precision is calculated by taking the sum of all True Positives
and then divided it by the sum of all True Positives and False Positives
of both the classes which gives a definite view of the whole dataset.
The same happens with Weighted average recall and F1-Score. The
accuracy is 85% for this combined model as it takes out the
redundancies in this previous models and gives a more precise
viewpoint of the situation.
Fig. 4.11 The Confusion Matrix Representing for the Ensemble Learning Model
42
value. The True Negative value is 53.0 meaning that the model
predicted the value to be negative and the actual value was negative.
False Positive value is 0.0 meaning that the predicted no value was
predicted to be false and came out as true. The False Negative value is
4.0 meaning 4 of the values were predicted to be true but the actual
value was false.
43
Chapter- 5 CONCLUSIONS
44
4) Model diversity: Ensemble learning relies on the diversity of
the base models to improve performance. Analyzing the base
models' diversity, such as their differences in training
algorithms, hyperparameters, or feature subsets, can provide
insights into the effectiveness of ensemble learning. If the base
models are diverse and yet contribute positively to the
ensemble, it indicates the successful utilization of ensemble
techniques
For the future of this project, the best way to go forward with it
would be the use of more Machine Learning Models and
combining their results using Ensemble learning or the use of
Deep Learning Models which can bring a huge change to the
accuracy and prediction rate of this project. Moreover the use
of different datasets can also help, using the datasets for
different species of leaves plants. The use of drones in
photographing leaves can change the game for detection of
plant diseases as the use of drones combined with machine
learning algorithms has emerged as a powerful tool for several
advantages in this context, including the ability to capture
high-resolution aerial imagery, cover large areas efficiently,
and provide real-time data collection. When combined with
machine learning algorithms, these advantages can
significantly enhance disease detection and management
efforts.
Drones can cover large areas of agricultural land quickly and
efficiently, allowing for the collection of real-time data on
plant health. This rapid data acquisition enables early detection
and timely intervention, reducing the spread and impact of
diseases.
46
REFERENCES
47
APPENDICE
Here are some additional details that could be included in an appendix for the detection of plant
diseases using machine learning:
1) Types of machine learning algorithms: There are several types of machine learning algorithms
that can be used for plant disease detection, including supervised learning, unsupervised learning,
and reinforcement learning. Each algorithm has its own strengths and weaknesses and is suited to
different types of data and problems.
2) Performance metrics: There are several performance metrics that can be used to evaluate the
performance of a machine learning model for plant disease detection, including accuracy,
precision, recall, F1-score, and area under the curve (AUC). These metrics can help determine the
effectiveness of the model and identify areas for improvement.
4) Transfer learning: Transfer learning is a technique that involves using a pre-trained machine
learning model as a starting point for training a new model. This can be particularly useful for
plant disease detection, as pre-trained models may have already learned features that are relevant
to the problem at hand.
5) Deployment considerations: Once a machine learning model has been trained for plant disease
detection, it must be deployed in a real-world setting. This may require integrating the model with
existing software systems, developing a user interface, and ensuring that the model is robust and
reliable.
48