Final Report
Final Report
INSTITUTE OF ENGINEERING
HIMALAYA COLLEGE OF ENGINEERING
PROJECT MEMBERS:
Nischal Maharjan (HCE076BEI007)
Rupak Gautam (HCE076BEI011)
Samarpan Ghimire (HCE076BEI012)
March, 2023
APPLE LEAF DISEASE DETECTION USING IMAGE
PROCESSING
SUPERVISOR
SUBMITTED TO
SUBMITTED BY
March, 2023
ABSTRACT
The project provides a baseline scenario for detecting disease in apple leaf by using
image processing and machine learning. The CNN algorithm was used in creation of
this project. The main aim of the proposed work is to predict the disease of 3 different
classes of apple plant diseases detection using the simplest approach while making use
of minimal computing resources to achieve better results compared to the traditional
models. A dataset of apple leaf images, containing healthy and diseased leaves, was
used to train the model. The CNN achieved an accuracy of over 90% in detecting the
presence of disease in apple leaves. In summary, an apple leaf disease web application
is used to improve production of apple plant which yield and improve the financial
situation of farmers.
i
ACKNOWLEDGEMENT
We would like to express our sincere appreciation to the Institute of Engineering (IOE)
and Himalaya College of Engineering for providing us with the opportunity to work on
this project.
We would like to take this opportunity to express our heartfelt gratitude to Er. Narayan
Adhikari Chhetri, our project supervisor, for his invaluable guidance and support
throughout our project. His extensive knowledge, insightful comments, and
unwavering encouragement have been instrumental in the successful completion of
this project.
We are extremely thankful to our Project Coordinator Er. Ramesh Tamang, for sharing
his expertise, providing constructive feedback, and motivating us to push our
boundaries and achieve our goals.
We would also like to acknowledge and thank Er. Ashok GM, Head of Department
(HOD), and Er. Devendra Kathayat, Deputy HOD, of the Department of Electronics
and Computer Engineering, for their support and encouragement throughout the
project. Their leadership, guidance, and mentorship have been essential in our success.
We would like to thank our friends and family for their unwavering support,
encouragement, and motivation throughout this project. Their love, support, and
understanding have been invaluable to us, and we could not have completed this
project without them.
ii
TABLE OF CONTENTS
ABSTRACT...................................................................................................................i
ACKNOWLEDGEMENT.............................................................................................ii
TABLE OF CONTENTS.............................................................................................iii
1. INTRODUCTION.....................................................................................................6
1.1 Background..........................................................................................................6
1.2 Objective..............................................................................................................7
2. LITERATURE REVIEW..........................................................................................9
3. FUNCTIONAL REQUIREMENT..........................................................................12
4. SYSTEM REQUIREMENT....................................................................................13
5. METHODOLOGY..................................................................................................17
5.1.2 Dataset.........................................................................................................20
6. WORK COMPLETED............................................................................................22
6.1 Firebase.........................................................................................................22
iii
6.1.1 Database......................................................................................................22
6.1.2 Authentication.............................................................................................23
6.2.1 Plots.................................................................................................................26
6.2.2 Accuracy......................................................................................................27
6.2 Streamlit........................................................................................................27
7. CONCLUSION........................................................................................................31
7.1 Challenges..........................................................................................................31
8. REFERENCES........................................................................................................33
APPENDIX..................................................................................................................35
iv
LIST OF FIGURES
Figure 1: Use case diagram..........................................................................................12
Figure 2: Data flow diagram level 0............................................................................13
Figure 3: Data flow diagram level 1............................................................................14
Figure 4: Activity diagram...........................................................................................15
Figure 5: Block diagram of image pre-processing......................................................18
Figure 6: CNN layers...................................................................................................19
Figure 7: Proposed workflow diagram........................................................................19
Figure 8: Sample Images of Apple Leaves..................................................................20
Figure 9: Deployment of database...............................................................................26
Figure 10: Authentication in database.........................................................................27
Figure 11: Loss Plot between Train and validation set................................................30
Figure 12: Accuracy Plot between Train and validation set........................................31
Figure 13: Accuracy of the model...............................................................................31
Figure 14: Login page..................................................................................................32
Figure 15: Sign up page...............................................................................................33
Figure 16: Homepage..................................................................................................34
5
1. INTRODUCTION
1.1 Background
Plant diseases are responsible for major economic losses in the agricultural industry
worldwide. Monitoring plant health and detecting pathogens early are essential to
reduce disease spread and facilitate effective management practices. Plant diseases
are not only a threat to food security at a global scale but can also have disastrous
consequences for smallholder farmers whose livelihoods depend on healthy crops.
Food losses due to crop infections from pathogens such as bacteria, viruses, and fungi
have been persistent issues in agriculture for centuries across the globe. The apple
industry is one of the most important fruit industries in China. However, the frequent
occurrence of apple leaf diseases may seriously restrict the healthy and stable
development of the apple industry. At present, the diseases of a large number of
industrialized apple orchards mainly rely on human vision for recognition, which
requires a high degree of reliance on disease experts. The identification task is huge,
especially since the visual inspection of fruit farmers or experts is prone to
misjudgment due to their subjective perception and visual fatigue, and it is difficult to
meet the demand for high-precision identification for intelligent orchards.
6
them. To test the accuracy of the proposed algorithm, manually segmented image
were compared with those segmented automatically.
We plan to create a web application using python and some of the frameworks and
libraries. In order to minimize the disease-induced damage in crops we introduced
“Apple Leaf Diseases Detection” through image processing.
1.2 Objective
The main objectives of this project are:
● To create a web application to detect the diseases in an apple leaf using CNN
The people will be able to check the diseases that are occurred in the apple
plants
The mass production of the apple can be increased which can directly help in
the production of healthy plants
Researchers can use this application in the field of identification of the
diseases in apple plants
7
farming. This process of detecting plant diseases is costly because farmers need to
consult experts, and experts need to detect diseases, classify their types, and require
continuous monitoring, so it is time consuming. Knowing that some farmers may not
be able to consult experts on a regular basis, the risk of crop contamination is very
high. In addition, chemicals and pesticides used in random amounts are extensively
used, especially in this case they only works with one type of crop, which affects the
production process and thus has a negative impact on the environment.
8
2. LITERATURE REVIEW
The foremost step in the process of software development is the literature survey.
Determining the economic strength and time factor is essential before creating any
software for the problem statement in hand. Hence, image processing is used for the
detection of plant diseases. Along with the different approach of image processing we
have found some of the application like our project and they are:
Apples are among the most valuable export products from Iran. Based on FAO (Food
and Agriculture Organization of the United Nations) statistics, Iran is the eighth
major producer and exporter of apples in the world. The price and market-friendliness
of apples are highly dependent on quality. Plant disasters cause significant decline in
the quality and quantity of agricultural products, and thus strongly influence the
economy of countries that rely solely on agriculture. In some cases, the disease can be
prevented or managed if the symptoms are identified in the early stages. Visual
detection by experts is the main approach employed in practice. However, this
requires continuous monitoring by plant pathologists, which is expensive, particularly
on the large farms [1] . Segmented binary image is inverted in color and holes in leaf
region are filled using region filling technique. In last leaf area is calculated using
known objects area. [2].
9
obtained by artificial cutting from each acquired digital disease image. Then the sub-
images were segmented using twelve lesion segmentation methods integrated with
clustering algorithms [3]. The images required for this work are captured from the
fields at Central Institute of Apple Research Nagpur, and the apple fields in Buldana
and Wardha district. Active contour model is used for image segmentation and Hu's
moments are extracted as features for the training of adaptive neuro-fuzzy inference
system. The classification accuracy is found to be 85 percent. [4].
Although new deep learning approaches have recently been introduced for leaf
disease identification, existing deep learning models such as VGG and ResNet have
been used previously. Therefore, a new deep learning architecture is proposed to
consider the leaf spot attention mechanism. The primary idea is that leaf disease
symptoms appear in the leaf area, whereas the background region does not contain
any useful information regarding leaf diseases. To realize this, two subnetworks are
designed. The first is a feature segmentation subnetwork to provide more
discriminative features for the separated background, leaf areas, and spot areas in the
feature map. The other is a spot-aware classification subnetwork to increase the
classification accuracy. To train the proposed leaf spot attention network, the feature
segmentation subnetwork is first learned with a new image set, where the
background, leaf area, and spot area are annotated. Subsequently, the spot-aware
classification subnetwork is connected to the feature segmentation subnetwork and
then trained through early and later fusions to produce the semantic-level spot feature
information. [5].
10
separable convolution is applied to the convolution module to reduce the number of
parameters, and the h-swish activation function is introduced to achieve the fast and
easy to quantify the process. Afterward, 5,170 images are collected in the field
environment at the apple planting base of the Northwest A&F University, while 3,000
images are acquired from the Plant Village public data set. [6]. Various Image
Augmentation techniques are included in this research to increase the dataset size,
and sub sequentially, the model’s accuracy increases. Our proposed model achieves
an accuracy of 96.25% on the validation dataset. The proposed model can identify
leaves with multiple diseases with 90% accuracy. [7].
11
3. FUNCTIONAL REQUIREMENT
Functional requirements are the specific functions that a system must have in order to
allow users to achieve their goals. These requirements typically describe how the
system will behave in certain circumstances. For example, our system includes image
processing, the extraction of leaf features, and the use of a CNN model to predict
diseases.
The figure 1, use case diagram describe the use case diagram of apple leaf disease
detection. At first, apple image is loaded. Then the image is preprocessed, and then it
goes to the segmentation process and then the leaf features are extracted from image
matrix. Here the aim of feature extraction is to find out and extract features that can
be used to determine the meaning of given sample, whereas, pre-processing is used
12
for improvement of the image data that suppresses unwanted distortions or enhance
some image feature important for further processing.
Finally, classifiers are used for the training and testing of the datasets. These
classifiers are used by the CNN model for processing of model. These methods are
used to classify disease.
4. SYSTEM REQUIREMENT
System requirement is that the stage where the theoretical design is converted into a
working system, the new system is additionally totally new, replacing an existing
manual, or automated system or it should be a major modification to an existing
system. The system is implemented using Visual Studio Code and data set. Below we
showed the data flow diagram of our system.
13
image save image
Predicted
disease
Processed
image
The fig 5 and fig 6 above are simply the data flow diagram of apple leaf disease
detection system. It is shown in two level one is level 0 and another is level 1. Dfd
level 0 also known as context diagram show the system in the surface level where
user simply interact with the system to give the image and system return the result i.e.
predicted disease.
The dfd level 1 which notates each of the main sub-processes that together form the
complete apple leaf disease detection system. There are four sub-processes and they
are: Upload image, Image adjustment, Extraction of features, Comparison with model
data. In the upload image part the image from the user is simply upload to the system
which is stored in the web cache. In the image adjustment the image from the web
cache which generally includes level, contrast, gamma, hue, saturation, and
brightness modification and also can be used to fix an overexposure image, correct
the color, and improve the brightness. The features is then extracted from the image
which help to detect the diseased part from the leaf which is stored in the data matrix.
14
After that the image is compared with the model image from the dataset to detect the
predicted disease. At the end the result is shown to the user.
15
4.2.1 Register process
16
4.2.2 Login process
5. METHODOLOGY
17
Figure 5: Block diagram of image pre-processing
Since we are going to use the convolution neural network (CNN) algorithm for image
segmentation. A convolutional neural network (CNN or convnet) is a subset of
machine learning.
ConvNet architecture has three kinds of layers: convolutional layer, pooling layer,
and fully-connected layer.
18
A fully-connected layer is responsible for using the acquired features for
prediction
In conclusion, the user selects the image and gets the result displayed to him. The
system receives the input image and preprocesses the image, then extracts the
features, saves the trained model, does the testing then predicts and displays the
result. [9]
19
5.1.2 Dataset
We have collected the dataset from Kaggle which is of size 135MB. For our project,
the dataset consists 4 different classes. The apple leaf consists of 3 types of diseased
class and 1 healthy class. The given dataset was already classified into their
individual classes. The dataset is classified into train and test. The 3 diseased classes
are: apple_scab, black_rot, cedar_apple_rust. The model will be trained using Google
Colab. The test file contains four sub classes with apple_scab containing 504 images,
black_rot containing 497 images, cedar_apple_rust containing 440 images and
healthy contains 502 images whereas the train folder also contains four sub classes
with apple_scab 2016, black_rot 1987, cedar_apple_rust 1760 and healthy contains
2008 images. The total training image is 7771 while the testing image will be 1943.
Fig 3 shows the sample of images from the dataset. [10]
20
To address this issue, we used a technique called dataset balancing. Balancing the
dataset involves adjusting the number of examples in each class so that the classes are
represented equally in the dataset. There are several ways to balance the dataset,
including undersampling, oversampling, and a combination of both.
In our project, we used oversampling, which involves creating new examples of the
minority class by duplicating existing examples or creating new synthetic examples.
We started by duplicating the existing examples of the minority classes until they had
the same number of images as the majority class. This ensured that the dataset had a
balanced distribution of classes, and that our model would learn from a representative
set of examples.
Overall, balancing the dataset helped us to train a more accurate and reliable model
by ensuring that it had a balanced distribution of examples from each class. This
improved our model's ability to detect the minority classes, and provided more
representative and diverse data for our model to learn from.
21
5.1.4 Visualizing dataset
22
Figure: Bar graph of dataset after oversampling
23
Figure: Pie chart of dataset before oversampling
24
Figure: Pie chart of dataset after oversampling
6. WORK COMPLETED
6.1 Firebase
In this project, Firebase is used as a database to store the results of the disease
detection process. Specifically, the Firebase Realtime Database is used to store the
predictions and associated user IDs. When a user uploads an image for disease
detection, the app runs the image through a pre-trained deep learning model and
generates a prediction. The prediction is then pushed to the Firebase Realtime
25
Database with the user ID. This allows for tracking and analysis of the disease
detection process.
6.1.1 Database
For database we have decided to use firebase as it is cloud-hosted, which makes it
scalable and accessible from anywhere in the world. We have used Firebase Realtime
Database as the database for storing user information, such as username, email, and
password. The data is stored in a JSON format, which makes it easily accessible for
the application. The JSON file can be easily taken from the Project Settings. It
provides high availability and low latency for the users of the application.
6.1.2 Authentication
The authentication process in Firebase Realtime Database is handled by Firebase
Authentication, which is a secure and user-friendly authentication system provided by
Firebase. It supports various authentication methods, including email and password,
phone numbers, and social logins. In this project, we use email and password
authentication to allow users to sign up and log in to the application. The users can be
authenticated either by login/signup option or can be added directly by using the add
user option in firebase.
26
Figure 10: Authentication in database
27
6.2 CNN Model
Input -> (256, 256, 3)
ReLU
ReLU
ReLU
ReLU
ReLU
Dropout
Softmax
28
The first layer is a Conv2D layer, which applies a 2D convolution operation to the
input image. The layer has 32 filters, each with a kernel size of 3x3. The activation
function used is ReLU, and the padding is set to 'same', which means the input image
is padded with zeros to ensure that the output has the same dimensions as the input.
The input shape is (256, 256, 3) which means the input image has a height and width
of 256 pixels and 3 color channels.
The network has five convolutional layers and each layer has the same structure as
the first layer, except the number of filters is increased from 32 to 64, 128, 256, and
then decreased to 128. After the last convolutional layer, the output is flattened using
the Flatten layer, which converts the 2D output feature maps into a 1D vector. A
Dropout layer is then applied to regularize the network by randomly dropping out
50% of the activations which helps prevent overfitting. Finally, a dense layer with 4
units and a softmax activation function is used to produce the final classification
output, where each unit represents a different class. Early Stopping was also
implemented in the model to prevent overfitting, where the model starts to perform
well on the training set but poorly on the test set.
29
Figure 11: Model summary
The model was trained using CNN model and this is the model summary of our
project. There are 814,532 total parameters, the trainable parameters are 814,532 and
non-trainable parameter is 0.
6.2.1 Plots
30
Figure 12: Accuracy Plot between Train and validation set
6.2.2 Accuracy
31
6.2.3 Other Metrics
32
6.3 Streamlit
33
Figure 15: Sign up page
34
6.3.4 Pre detection page
The detection page is a critical part of the apple leaf disease detection web app that
we built using Streamlit and Python. The detection page allows users to upload an
image of an apple leaf and get a prediction of the type of disease that may be
affecting it. The users can only upload file with extensions of .jpg, .jpeg, .png.
Figure: Landing page when user selects detection before uploading image
35
Figure: Landing page when user selects detection after uploading image
6.3.6 Treatment
We have incorporated a treatment button that appears once the disease is diagnosed.
The button provides users with a list of recommended treatments that can help to
manage the disease.
36
37
7 CONCLUSION
The dataset was collected from Kaggle. The dataset was split into two sections: Train
and test. The train and test data were already split into 75:25 ratio. It was uploaded to
Google Collab. The dataset was then tested. The model built using CNN was 5 layer
Convolution layers. The accuracy calculated was 91.38%.
The database was set up using Firebase which provides real time database feature and
Authentication which helps in sign in and Login.
The website was built using streamlit. The website has a login and signup option.
After registration process we are directed to the home page which tells us about the
minor details of the project.
7.1 Challenges
Some of the challenges faced during the process are listed below:
The most crucial step in creating a machine learning model, which required
time, was obtaining a good dataset.
Some of the packages required downgrading of Python during the virtual
environment setup, which took time.
The system's lack of a GPU had caused the ML model's runtime to increase;
this issue was resolved by using a Colab notebook. Google Colab
occasionally stopped offering GPU when the maximum number of users was
reached.
Finding the correct algorithm, understanding how it operated, and adapting it
to the system's needs were difficult aspects of the development process.
The produced web application contained more errors as a result of ML model
integration.
Class imbalance was another major issue faced during the project which led
to overfitting.
38
7.2 Advantages of the system
It has the potential to detect signs of disease on apple leaves at an early stage,
which could help prevent the spread of disease.
The designed system executes its functions quickly and simply.
Uses CNNs for image recognition, which can achieve a high degree of
accuracy in detecting different types of diseases on apple leaves.
The system is trustworthy and produces reliable outcomes.
The project could help farmers identify patterns and trends in disease which
could inform future farming practices and disease prevention strategies.
Automated detection process saves time compared to manual detection
methods, freeing up farmers to focus on other aspects of their operations.
39
The project can be used to develop a recommendation system for farmers
based on the type of crop, soil, and weather conditions.
The application will be able to detect the diseases in real time.
The project can be used to develop a recommendation system for farmers
based on the type of crop, soil, and weather conditions.
Deploy the system so that anyone can access it.
40
8. REFERENCES
41
[9] S. B. R. K. Sahana Uday Naik, "PLANT DISEASE DETECTION USING
LEAF IMAGES," Student of Department of Computer science, Srinivas
Institute of Technology Mangalore, Karnataka , 2021-07-07.
42
APPENDIX
43