0% found this document useful (0 votes)
271 views

Disease Detection in Plants - Report

This document is a project report on leaf disease detection using Python (Open CV). It was submitted by Simran M Mohan and Harshinee S in partial fulfillment of the Bachelor of Engineering degree in Computer Science and Engineering. The project aims to identify plant diseases and provide remedies by developing a database of plant species and diseases obtained from the internet. A CNN classifier is trained on the data to predict diseases with 78% accuracy. A drone prototype is also designed to capture plant images for input to the software to detect healthy and diseased plants in agricultural fields.

Uploaded by

Prajwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
271 views

Disease Detection in Plants - Report

This document is a project report on leaf disease detection using Python (Open CV). It was submitted by Simran M Mohan and Harshinee S in partial fulfillment of the Bachelor of Engineering degree in Computer Science and Engineering. The project aims to identify plant diseases and provide remedies by developing a database of plant species and diseases obtained from the internet. A CNN classifier is trained on the data to predict diseases with 78% accuracy. A drone prototype is also designed to capture plant images for input to the software to detect healthy and diseased plants in agricultural fields.

Uploaded by

Prajwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 78

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROJECT REPORT

ON

“Leaf-Disease-Detection using Python (Open CV)”

Submitted in partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

BY

Simran M Mohan (1NH14CS727)


Harshinee S (1NH17CS754)

Under the guidance of

Dr. Pamela Vinita Eric


Assistant Professor,
Dept. of CSE, NHCE
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
It is hereby certified that the project work entitled Leaf-Disease-Detection using Python
(Open CV)”is a bonified work carried out by I Simran M Mohan (1NH14CS727), Harshinee
S (INH14CS754)in partial fulfilment for the award of Bachelor of Engineering in COMPUTER
SCIENCE AND ENGINEERING of the New Horizon College of Engineering during the year
2019-2020. It is certified that all corrections/suggestions indicated for Internal Assessment
have been incorporated in the Report deposited in the departmental library. The project
report has been approved as it satisfies the academic requirements in respect of project
work prescribed for the said Degree.

………………………… ……………………….. ………………………………


Signature of Guide Signature of HOD Signature of Principal
(Dr. Pamela Vinita Eric) (Dr.B. Rajalakshmi) (Dr. Manjunatha)

External Viva

Name of Examiner Signature with date

1. ………………………………………….. ………………………………….

2. …………………………………………… …………………………………..
ABSTRACT
The proposed system helps in identification of plant disease and provides remedies
that can be used as a defense mechanism against the disease. The database obtained from the
Internet is properly segregated and the different plant species are identified and are renamed
to form a proper database then obtain test-database which consists of various plant diseases
that are used for checking the accuracy and confidence level of the project .Then using
training data we will train our classifier and then output will be predicted with optimum
accuracy . We use Convolution Neural Network(CNN) which comprises of different layers
which are used for prediction. A prototype drone model is also designed which can be used
for live coverage of large agricultural fields to which a high resolution camera is attached and
will capture images of the plants which will act as input for the software, based of which the
software will tell us whether the plant is healthy or not. With our code and training model we
have achieved an accuracy level of 78% .Our software gives us the name of the plant species
with its confidence level and also the remedy that can be taken as a cure.
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would be
impossible without the mention of the people who made it possible, whose constant guidance
and encouragement crowned our efforts with success. I have great pleasure in expressing my
deep sense of gratitude to Dr. Mohan Manghnani, Chairman of New Horizon Educational
Institutions for providing necessary infrastructure and creating good environment. I take this
opportunity to express my profound gratitude to Dr. Manjunatha, Principal NHCE, for his
constant support and encouragement. I am grateful to Dr. Prashanth C.S.R, Dean Academics,
for his unfailing encouragement and suggestions, given to me in the course of my project
work. I would also like to thank Dr. B. Rajalakshmi, Professor and Head, Department of
Computer Science and Engineering, for her constant support. I express my gratitude to Dr.
Pamela Vinita Eric, Senior Assistant Professor, my project guide, for constantly monitoring
the development of the project and setting up precise deadlines. Her valuable suggestions
were the motivating factors in completing the work. Finally a note of thanks to the teaching
and non-teaching staff of Dept of Computer Science and Engineering, for their cooperation
extended to me, and my friends, who helped me directly or indirectly in the course of the
project work.

Simran MMohan (1NH14CS727)

Harshinee S (1NH14CS754)
CONTENTS

ABSTRACT I

ACKNOWLEDGEMENT II

LIST OF FIGURES V

1. INTRODUCTION
DOMAIN INTRODUCTION 1
PROBLEM DEFINITION 2
OBJECTIVES 2
SCOPE OF THE PROJECT 2

2. LITERATURE SURVEY
TECHNOLOGY 4
EXISTING SYSTEM 7
PROPOSED SYSTEM 8
METHODOLOGY 9
MODULES 9
PRODUCT PERSPECTIVE 10
DESIGN DESCRIPTION 10
DESIGN APPROACH 10

3. REQUIREMENT ANALYSIS
FUNCTIONAL REQUIREMENTS 12
NON FUNCTIONAL REQUIREMENTS 12
DOMAIN AND UI REQUIREMENTS 13
HARDWARE REQUIREMENTS 14
SOFTWARE REQUIREMENTS 14
DATA REQUIREMENTS 15

4. DESIGN
DESIGN GOALS 16
OVERALL SYSTEM ARCHITECTURE 17
DATA FLOW DIAGRAM 17
STATE MACHINE UML DIAGRAM 20
SEQUENCE DIAGRAM 21
INTERACTION OVERVIEW DIAGRAM 22
USECASE DIAGRAM 23
ALGORITHM/PSEUDOCODE 24

5. IMPLEMENTATION 34

6. TESTING
TEST STRATEGY 35
PERFORMANCE CRITERIA 36
RISK IDENTIFICATION AND CONTIGENCY PLANNING 37
TEST SCHEDULE 38
ACCEPTANCE CRITERIA 40

7. EXPECTED OUTPUT 42
8. CONCLUSION 49
9. FURTURE ENHANCEMENTS 50
REFERENCES 51
LIST OF FIGURES

Fig. No Figure Description Page


No
2.1 Technology Diagram
4
4.1 Design Goals 16

4.2 Overall System Architecture 17

4.3 Data Flow/Activity Diagram 17

4.4 State UML Diagram 20

4.5 Sequence Diagram 21

4.6 Interaction Overview Diagram 22

4.7 Use Case Diagram 23

7.1 Search for Particular Places 42

7.2 Options Used Leaf-Disease-Detection 43

7.3 Sharing Leaf Details 43

7.4 Leaf-Disease-Detection 44
CHAPTER 1

INTRODUCTION

The primary occupation in India is agriculture. India ranks second in the agricultural output
worldwide. Here in India, farmers cultivate a great diversity of crops. Various factors such as
climatic conditions, soil conditions, various disease, etc affect the production of the crops.
The existing method for plants disease detection is simply naked eye observation which
requires more man labor, properly equipped laboratories, expensive devices ,etc. And
improper disease detection may led to inexperienced pesticide usage that can cause
development of long term resistance of the pathogens, reducing the ability of the crop to fight
back. The plant disease detection can be done by observing the spot on the leaves of the
affected plant. The method we are adopting to detect plant diseases is image processing using
Convolution neural network(CNN). The first implementation of the plant disease detection
using image processing was done by Shen WeizhegWuyachun Chen Zhanliang and Wi
Hangda in their paper[1].
1.1 INTRODUCTION

The human visual system has no problem interpreting the subtle variations in translucency
and shading in this Figure 1.1 photograph and correctly segmenting the object from its
background.

Figure1.1. Lotus flower seen as to the naked eye.


Let‘s imagine a person taking a field trip, and seeing a bush or a plant on the ground, he or
she would like to know whether it‘s a weed or any other plant but have no idea about what
kind of plant it could be. With a good digital camera and a recognition program, one could
get some useful information. Plants play an important role in our environment. Without
plants there will be no existence of the earth‘s ecology. But in recent days, many types of
plants are at the risk of extinction. To protect plants and to catalogue various types of flora
diversities, a plant database is an important step towards conservation of earth‘s biosphere.
There are a huge number of plant species worldwide. To handle such volumes of information,
development of a quick and efficient classification method has become an area of active
research. In addition to the conservation aspect, recognition of plants is also necessary to
utilize their medicinal properties and using them as sources of alternative energy sources like
bio-fuel. There are several ways to recognize a plant, like flower, root, leaf, fruit etc.

1.2 Background

Since recent decades, digital image processing, image analysis and machine vision
have been sharply developed, and they have become a very important part of artificial
intelligence and the interface between human and machine grounded theory and applied
technology. These technologies have been applied widely in industry and medicine, but rarely
in realm related to agriculture or natural habitats.

Despite the importance of the subject of identifying plant diseases using digital image
processing, and although this has been studied for at least 30 years, the advances achieved
seem to be a little timid. Some facts lead to this conclusion:

Methods are too specific. The ideal method would be able to identify any kind of
plant. Evidently, this is unfeasible given the current technological level. However, many of
the methods that are being proposed not only are able to deal with only one species of plant,
but those plants need to be at a certain growth stage in order to the algorithm to be effective.
That is acceptable if the plant is in that specific stage, but it is very limiting otherwise. Many
of the researchers do not state this kind of information explicitly, but if their training and test
sets include only images of a certain growth stage, which is often the case, the validity of the
results cannot be extended to other stages.

Operation conditions are too strict. Many images used to develop new methods are
collected under very strict conditions of lighting, angle of capture, distance between object
and capture device, among others. This is a common practice and is perfectly acceptable in
the early stages of research. However, in most real world applications, those conditions are
almost impossible to be enforced, especially if the analysis is expected to be carried out in a
non-destructive way. Thus, it is a problem that many studies never get to the point of testing
and upgrading the method to deal with more realistic conditions, because this limits their
scope greatly. Lack of technical knowledge about more sophisticated technical tools. The
simplest solution for a problem is usually the preferable one. In the case of image processing,
some problems can be solved by using only morphological mathematical operations, which
are easy to implement and understand. However, more complex problems often demand more
sophisticated approaches. Techniques like neural networks, genetic algorithms and support
vector machines can be very powerful if properly applied. Unfortunately, that is often not the
case. In many cases, it seems that the use of those techniques is in more demand in the
scientific community than in their technical appropriateness with respect to the problem at
hand. As a result, problems like over fitting, overtraining, undersized sample sets, sample sets
with low representativeness, bias, among others, seem to be a widespread plague. Those
problems, although easily identifiable by a knowledgeable individual on the topic, seem to go
widely overlooked by the authors, probably due to the lack of knowledge about the tools they
are employing. The result is a whole group of technically flawed solutions.

In recent times, computer vision methodologies and pattern recognition techniques


have been applied towards automated procedures of plant recognition. Digital image
processing is the use of the algorithms and procedures for operations such as image
enhancement, image compression, image analysis, mapping, geo-referencing, etc. The
influence and impact of digital images on modern society is tremendous and is considered as
a critical component in variety of application areas including pattern recognition, computer
vision, industrial automation and healthcare industries.

One of the most common methods in leaf feature extraction is based on morphological
features of leaf. Some simple geometrical features are aspect ratio, rectangularity, convexity,
sphericity, form factor etc.

One can easily transfer the leaf image to a computer and a computer can extract
features automatically in image processing techniques. Some systems employ descriptions
used by botanists. But it is not easy to extract and transfer those features to a computer
automatically.

The aim of the project is to develop a Leaf recognition program based on specific
characteristics extracted from photography. Hence this presents an approach where the plant
is identified based on its leaf features such as area, histogram equalization and edge detection
and classification. The main purpose of this program is to use Open-CV resources.

Indeed, there are several advantages of combining Open-CV with the leaf recognition
program. The result proves this method to be a simple and an efficient attempt. Future
sections will discuss more on image preprocessing and acquisition which includes the image
preprocessing and enhancement, histogram equalization, edge detection. Further on sections
introduces texture analysis and high frequency feature extraction of a leaf images to classify
leaf images i.e. parametric calculations and then followed by results.
1.3 Motivation

Here is the brief review of the papers which we have referred for this project. Since digital
image processing is used in this project to detect diseases in plants, it eliminates the
traditional methods which are used in olden days and also it removes human error. This
method needs a digital computer, mat lab software and a digital camera to detect diseases in
plants. So it is a suitable method to adapt for this project. In the paper by Pallavi S. Marathe,
different steps like Image acquisition, Pre processing which includes clipping, smoothing and
Contrast enhancement. She has also used Segmentation techniques to partition different parts
in an image. Disease detection is done by extracting features and classifying using SVM
algorithm.

1.4 Objectives

• To detect unhealthy region of plant leaves particularly Tomato Plant.

• Classification of plant leaf diseases using texture features.

• Coding is used to analyze the leaf infection.

1.5 Future Scope

Using new Different technologies and method we can make more faster and efficient
application for user. The system presented in this project was able to perform accurately,
however there are still a number of issues which need to be addressed. First of all, we
consider only four diseases in this project therefore the scope of disease detection is limited.
In order to increase the scope of the disease detection large datasets of different disease
should be use.
CHAPTER 2

LITERATURE SURVEY

2.1 EXISTING METHODS

Earlier papers are describing to detect mainly pests like aphids, whiteflies, thrips, etc.
using various approaches suggesting the various implementation ways as illustrated and
discussed below. Proposed a cognitive vision system that combines image processing,
learning and knowledge-based techniques. They only detect mature stage of white fly and
count the number of flies on single leaflet. They used 180 images as test dataset .among this
images they tested 162 images and each image having 0 to 5 whitefly pest. They calculate
false negative rate (FNR) and false positive rate (FPR) for test images with no whiteflies
(class 1), at least one white fly (class 2) and for whole test set. Extend implementation of the
image processing algorithms and techniques to detect pests in controlled environment like
greenhouse. Three kinds of typical features including size, morphological feature (shape of
boundary), and color components were considered and investigated to identify the three kinds
of adult insects, whiteflies, aphids and thrips. Promote early pest detection in green houses
based on video analysis. Their goal was to define a decision support system which handles a
video camera data. They implemented algorithms for detection of only two bio-aggressors
name as white flies and aphids. The system was able to detect low infestation stages by
detecting eggs of white flies thus analyzing behavior of white flies. Proposed pest detection
system including four steps name as color conversion, segmentation, reduction in noise and
counting whiteflies. A distinct algorithm name as relative difference in pixel intensities (RDI)
was proposed for detecting pest named as white fly affecting various leaves. The algorithm
not only works for greenhouse based crops but also agricultural based crops as well. The
algorithm was tested over 100 images of white fly pest with an accuracy of 96%. Proposed a
new method of pest detection and positioning based on binocular stereo to get the location
information of pest, which was used for guiding the robot to spray the pesticides
automatically. Introduced contextual parameter tuning for adaptive image segmentation that
allows to efficiently tune algorithm parameters with respect to variations in leaf color and
contrast. Presents an automatic method for classification of the main agents that cause
damages to soybean leaflets, i.e., beetles and caterpillars using SVM classifier.
2.2 Early Detection of Pests on Leaves Using Support Vector Machine:

This project deals with a new type of early detection of pests system. Images of the
leaves affected by pests are acquired by using a digital camera. The leaves with pest images
are processed for getting a gray colored image and then using image segmentation, image
classification techniques to detect pests on leaves. The image is transferred to the analysis
algorithm to report the quality. The technique evolved in this system is both image processing
and soft computing. The image processing technique is used to detect the pests and soft
computing technique is used for doing this detection over a wide population. The images are
acquired by using a digital camera of approximately 12 M-Pixel resolution in 24-bits color
resolution. The images are then transferred to a PC and represented in Open-CV software.
The RGB image is then segmented using blob like algorithm for segmentation of pests on
leaves. The segmented leave part is now analyzed for estimating pest density in field. The
Support Vector Machine classifier is used to classify the pest types. It is also implemented in
FPGA kit by converting the Open-CV coding into HDL coder. In FPGA, the input image is
downloaded to the memory. It reads the image from memory, process it and display the
output image on monitor.

A software routine was written in Open-CV. In which training and testing performed
via several neural network classifier. Texture Feature Classification Methods are as follows.

2.2.1. K-nearest neighbor:

K-nearest neighbor classifier is used to calculate the minimum distance between the
given point and other points to determine the given point belongs to which class. Goal is to
computes the distance from the query sample to every training sample and selects the
neighbor that is having minimum distance.

2.2.2. Radial basis function:

A radial basis function (RBF) is a real-valued function whose value depends only on the
distance from the origin. The normally used measuring norm is Euclidean distance. RBF‘s
are the networks where the activation of hidden units is based on the distance between the
input vector and a prototype vector.
2.2.3. Artificial neural networks:

ANNs are popular machine learning algorithms that are in a wide use in recent years.
Multilayer Perception (MLP) is the basic form of ANN that updates the weights through back
propagation during the training [16]. There are other variations in neural networks, which are
recently, became popular in texture classification Probabilistic Neural Network (PNN): It is
derived from Radial Basis Function (RBF) network and it has parallel distributed processor
that has a natural tendency for storing experiential knowledge. PNN is an implementation of a
statistical algorithm called kernel discriminate analysis in which the operations are organized
into a multilayered feed forward network having four layers viz. input layer, pattern layer,
summation layer, and output layer.

2.3 Back propagation network:

A typical BP network consists of three parts: input layer, hidden layer and output
layer. Three parts in turn connect through the collection weight value between nodes. The
largest characteristic of BP network is that network weight value reach expectations through
the sum of error squares between the network output and the sample output, and then it
continuously adjusted network structure's weight value. It is popular and extensively used for
training feed forward networks. Also it has no inherent novelty detection, so it must be
trained on known outcomes for training feed forward networks.

2.4 Support vector machine:

SVM is a non-linear classifier, and is a newer trend in machine learning algorithm.


SVM is popularly used in many pattern recognition problems including texture classification.
SVM is designed to work with only two classes. This is done by maximizing the margin from
the hyper plane. The samples closest to the margin that were selected to determine the hyper
plane is known as support vectors [12]. Multiclass classification is applicable and basically
built up by various two class SVMs to solve the problem, either by using one-versus-all or
one. Another feature is the kernel function that projects the non-linearly separable data from
low-dimensional space to a space of higher dimension so that they may become separable in
the higher dimensional space too.

The first step in the proposed approach is to capture the sample from the digital
camera and extract the features. The sample is captured from the digital camera and the
features are then stored in the database.
Preprocessing images is used to removing low-frequency background noise.
Normalizing the intensity of the individual particles of images. It enhances the visual
appearance of images. Improve the manipulation of datasets. It is the technique of enhancing
data images prior to computational processing. The caution is enhancement techniques can
emphasize image artifacts, or even lead to a loss of information if not correctly used. The
steps involved in preprocessing are to get an input image and then the image has to be
enhanced. Then the RGB image is converted to an gray scale image to get an clear
identification of pests on leaves. Noise removal function can be performed by using filtering
techniques. Mean filtering: The 3x3 sub-region is scanned over the entire image. At each
position the center pixel is replaced by the average value. Median filtering: The 3x3 sub-
region is scanned over the entire image. At each position the center pixel is replaced by the
median value.

The PSNR value is calculated for both the mean and median filter. Based on the
PSNR value one of the filtering images is taken for a further process. For mean filtering, the
PSNR value is 23.78 and the PSNR value for median filtering is 12.89. The higher the PSNR,
the better the quality of the compressed or reconstructed image. Therefore the mean filtering
is taken for the further process.

Image segmentation in general is defined as a process of partitioning an image into


homogenous groups such that each region is homogenous but the union of no two adjacent
regions is homogenous[11] Image segmentation is performed to separate the different regions
with special significance in the image . These regions do not intersect each other. Blob
detection helps to obtain Regions of Interest for further processing. It is applied for the
presence of same type of objects in multiples. Segment the objects of interest (white flies)
from the complex background (leaves).

Image features usually include color, shape and texture features. Feature extraction is
performed related to the Majority Based Voting method there are 3steps involved: 1)
Histogram Oriented Gradient (HOG), 2) Gaussian Mixture Model (GMM), 3) Gabor Feature.
HOG is the feature descriptors used for the purpose of object detection. Gaussian mixture
model is used for the texture analysis. Gabor Feature is calculating the relationship between
groups of two pixels in the original image. In this proposed work, the image can be sub
divided into small block. Then in each block the three steps are involved. HOG is used for
detecting the distribution of color ratio in an image. GMM used for the detection of shape of
pests present in an image. Gabor feature can be used to find the orientation of pests. Finally,
the feature values are fed as input to the classifiers.

There are 3types of classifier are used to which classifier gives the better result. The
back propagation and feed forward classifiers are not detecting a some pests in an image. But
SVM gives better result. SVM is a non-linear classifier, and is a newer trend in machine
learning algorithm. SVM is popularly used in many pattern recognition problems including
texture classification. SVM is designed to work with only two classes. This is done by
maximizing the margin from the hyper plane. The samples closest to the margin that were
selected to determine the hyper plane is known as support vectors [12]. Multiclass
classification is applicable and basically built up by various two class SVMs to solve the
problem, either by using one-versus-all or one. Another feature is the kernel function that
projects the non-linearly separable data from low-dimensional space to a space of higher
dimension so that they may become separable in the higher dimensional space too. It is used
to detect the pest on leaves and also gives information about a type of pests. It gives a result
of number of pests are presented. Then, it gives a remedy to take over for controlling a pest.
Finally, the feature values are fed as input to the Support Vector Machine classifier, allow us
to accurately distinguish the pests and leaves. This is an important step towards the
identification of pests and to take the corresponding remedies.

2.5 Classification of Fungal Disease Symptoms affected on Cereals using


Color Texture Features

This paper describes Support Vector Machine (SVM) and Artificial Neural Network (ANN)
based recognition and classification of visual symptoms affected by fungal disease. Color
images of fungal disease symptoms affected on cereals like wheat, maize and jowar are used
in this work. Different types of symptoms affected by fungal disease namely leaf blight, leaf
spot, powdery mildew, leaf rust, smut are considered for the study. The developed algorithms
are used to preprocess, segment, extract features from disease affected regions. The affected
regions are segmented using k-means segmentation technique. Color texture features are
extracted from affected regions and then used as inputs to SVM and ANN classifiers. The
texture analysis is done using Color Co-occurrence Matrix. Tests are performed to classify
image samples. Classification accuracies between 68.5% and 87% are obtained using ANN
classifier. The average classification accuracies have increased to 77.5% and 91.16% using
SVM classifier.
This work implements a machine vision system for the classification of the visual
symptoms of fungal disease. In the present work, tasks like image acquisition, segmentation,
feature extraction and classification are carried out.The classification tree are shown in Figure
2.

Figure2. Classification tree.


CHAPTER 3

System Requirement Specification

System Configuration:
HARDWARE:

 System : Pentium IV 2.4 GHz.


 Hard Disk : 40 GB.
 Monitor : 15 VGA Colour.
 Mouse : Logitech.
 Ram : 512 Mb

SOFTWARE:

 Operating system : Windows XP/ Windows 7 or More


 Software Tool : Open CV.
 Coding Language : Python.
 Toolbox : Image processing toolbox.
Functional Requirements:

 The Software must be able to detect disease in leaf.


 It should be able to extract texture features of the leaves.
 It should display disease name
 It should display Remedy Name

Non Functional Requirements

 Detection of Disease must be accurate


 The detection process should be done effectively and efficiently
 The software should never fail in middle of the operation
CHAPTER 4
METHODOLOGY

4.1 Design of Machine Learning Model

We can reduce the attack of pests by using proper pesticides and remedies .We can reduce
the size of the images by proper size reduction techniques and see to it that the quality is not
compromised to a great extent. We can expand the projects of the earlier mentioned authors
such that the remedy to the disease is also shown by the system . The main objective is to
identify the plant diseases using image processing. It also, after identification of the disease,
suggest the name of pesticide to be used. It also identifies the insects and pests responsible
for epidemic. Apart from these parallel objectives, this drone is very time saving. The budget
of the model is quite high for low scale farming purposes but will be value for money in large
scale farming. It completes each of the process sequentially and hence achieving each of the
output.
Thus the main objectives are:
1) To design such system that can detect crop disease and pest accurately.
2) Create database of insecticides for respective pest and disease.
3) To provide remedy for the disease that is detected.

Leaf miners are the insect family at larval stage.Theyfeed between upper and lower part of
the leaf.

Leaf miner disease

Due to insect on very much amount in plant, it is severely damaged. On a single leaf the
number of maggots can be six. Therefore, it can severely damage the leaf of plant. It can
restrict plantgrowth, leads to reduced yields.
Hence we can develop a robot, using image processing to detect the disease, to classify
it.This will avoid human interference and hence lead to précised unprejudiced decision.

Generally, whatever our observation about the disease is just used for the decision of the
disease. A symptom of plant disease is a visible effect of disease on the plant. Symptoms can
be change in color, change in the shape or functional changes of the plant as per its response
to the pathogens, insects etc. Leaf wilting is a characteristic symptom of verticilium wilt. It is
caused due to the fungal plant pathogens V. dahliaeandVerticilliumalbo-atrum. General
common bacterial disease symptoms are brown, necrotic lesions which gets surrounded by
abright light yellow halo at the edge of the leaf of the plant or at innerpart of the leaf on the
bean plants. You are not actually seeing the disease pathogen, but rather a symptom that is
being caused by the pathogen.

Design of Machine Learning Model


FIG: ML Model with Two phases

In order to build a machine leaning model it consists of two phase namely testing and training
phase were the model is first trained and an input is given to test the model which is called
the test data. The model consists of several image processing steps such as image acquisition,
image pre-processing,segmentation, feature extraction and SVM classifier to classify the
diseases. Image acquisition:The diseased leaf image is acquired using the camera, the image
is acquired from a certain uniform distance with sufficient lighting for learning and
classification. The sample images of the diseased leaves are collected and are used in training
the system. To train and to test the system, diseased leaf images and fewer healthy images are
taken. The images will be stored in some standard format.The image background should
provide a proper contrast to the leaf color. Leaf disease dataset is prepared with both black
and white background, based on the comparative study black background image provides
better results and hence it is used for the disease identification leaf.
Image pre-processing: Image acquired using the digital camera is pre-processed using the
noise removal with averaging filter, color transformation and histogram equalization. The
color transformation step converts the RGB image to HSI (Hue, Saturation and intensity)
representation as this color space is based on human perception. Hue refers to the dominant
color attribute in the same way as perceived by a human observer. Saturation refers to the
amount of brightness or white light added to the hue. Intensity refers to the amplitude of light.
After the RGB to HSI conversion, Hue part of the image is considered for the analysis as this
provides only the required information. S and I component are ignored as it does not give any
significant information.

FIG: RGB to HIS

Masking green pixels: Since most of the green colored pixels refer to the healthy leaf and it
does not add any value to the disease identification techniques, the green pixels of the leaf are
removed by a certain masking technique, this method significantly reduces processing time.
The masking of green pixels is achieved by computing the intensity value of the green pixels,
if the intensity is less than a predefined threshold value, RGB component of that particular
pixel is assigned with a value of zero. The green pixel masking is an optional step in our
disease identification technique as the diseased part of the leaf is able to be completely
isolated in the segmentation process.
Segmentation: There are different image segmentation techniques like threshold based, edge
based, cluster based and neural network based. One of the most efficient methods is the
clustering method which again has multiple subtypes, kmeans clustering, Fuzzy C-means
clustering, subtractive clustering method etc. One of most used clustering algorithm is k-
means clustering. K-means clustering is simple and computationally faster than other
clustering techniques and it also works for large number of variables. But it produces
different cluster result for different number of number of cluster and different initial centroid
values. So it is required to initialize the proper number of number of cluster k and proper
initial centroid.K-means is an general purpose methods that is being used at many domains to
different problems.
In this project, k-means is a clustering method used to get the clusters of k numbers which
matches the specified characters like to segment the leaf.
Fig. K-means algorithm

Flow Diagram
ALGORITHM

1. Capture the image in RGB format.

2. Generate color transformation structure.

3. Convert color values from RGB to the space specified in that structure.

4. Apply K means clustering for image segmentation.

5. Masking of green pixels (masking green channel).

6. Eliminate the masked cells present inside the edges of the infected cluster.

7. Convert the infected cluster from RGB to HIS.

8. Generation of SGDM matrix for H and S.

9. Calling GLCM function in order to calculate the features of it.

10. Computation of texture statistics

11. Configure knn (classifier) for recognition.

Disease detection by using k clustering method [2].The algorithm provides the necessary
steps required for the image detection of the plant leaf.In the first step, generally the RGB
images of all the leaves are capturedusing camera. In step 2 a color transformation structureis
formed, and thencolor space transformationis applied in step 3.These two steps are to be
expectedin order to perform step 4. In this step the images which we have got are processed
for segmentation by using the K-Means clustering technique [2]. These four steps comeunder
phase one, the infectedobjectsdetected and determined.

In step 5, the green pixels are detected. Then masking of green pixels is done as: if the green
color value of pixel is less thanthe threshold value which we alreadyhave calculated, then the
red, green and bluecomponents values of the these pixel aremade zero. This is done because
these are the unaffected part. That is why there values are made zero which results in
reduction in calculations as well.Additionally, the time consumed by the raspberry pi3 for
showing the final output will greatly minimized.

In step 6 the pixels having zero value for red, green and blue andthe pixels on the edge of the
infected clustersare removed completely. Phase 2contains step five and step number six and
thisphase gives addedclarity in the classifying of that disease. This results with good
detection and performance, also generally required computing time should bedecreased to its
minimum value.

In step number seven, the infected cluster is converted fromRGB form to HSI format.After
that , the SGDMmatrices arecreated for every pixel of the image. But this is done for only for
H and S images and not for the I images. The SGDM [1] actually measures theprobability that a
given pixel at one particular gray level willoccur at a different distance and angle of orientation from
otherpixel, however pixel has a second particular gray level for it. Fromthe SGDM matrices,
generation of texture statistics for each and every image is done.

Concisely, the features are calculated for the pixelspresent inside the edge of the infected part
of the leaf. That means, the part which is not affected inside the boundary of infected partgets
uninvolved. Steps seven to tencome under phase three. In this phasethe features related to
texture for the objects being segmented are computed.

Finally, the recognition process in the fourth phasewas performed. For each image we have
captured the steps in the algorithm are repeated each time. After this the result are transferred
to GSM module. Using Raspberry Pi the result is sent as e -mail, and also is displayed on
monitor.
Feature Extraction:From the input images, the features are to be extracted. To do so instead
of choosing the whole set of pixels we can choose only which are necessary and sufficient to
describe the whole of the segment. The segmented image is first selected by manual
interference. The affected area of the image can be found from calculating the area
connecting the components. First, the connected components with 6 neighborhood pixels are
found. Later the basic region properties of the input binary image are found. The interest here
is only with the area. The affected area is found out. The percent area covered in this segment
says about the quality of the result. The histogram of an entity or image provides information
about the frequency of occurrence of certain value in the whole of the data/image. It is an
important tool for frequency analysis. The co-occurrence takes this analysis to next level
wherein the intensity occurrences of two pixels together are noted in the matrix, making the
co-occurrence a tremendous tool for analysis. From gray-co-matrix, the features such as
Contrast, Correlation, Energy, Homogeneity' are extracted. The following table lists the
formulas of the features.

Classification using SVM: A support vector machine comes under supervised learning
model in the machine learning. SVM‘s are mainly used for classification and regression
analysis. SVM has to be associated with learning algorithm to produce an output. SVM has
given better performance for classifications and regressions as compare to other processes.

There are sets of training which belong to two different categories. The SVM training
algorithm creates a model that allots new examples into one category or into the other
category, which makes it non-probabilistic binary linear classifier. The representation in
SVM shows points in space and also they are mapped so the examples come across as they
have been divide by a gap which is as wide as possible.
Detailed Explanation:

5.1 Python IDE

Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s
elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
The Python interpreter is easily extended with new functions and data types implemented in
C or C++ (or other languages callable from C). Python is also suitable as an extension
language for customizable applications.

5.2 OpenCV

OpenCV is a library of programming functions mainly aimed at real-time computer vision. It


has a modular structure, which means that the package includes several shared or static
libraries. We are using image processing module that includes linear and non-linear image
filtering, geometrical image transformations (resize, affine and perspective warping, and
generic table-based remapping), color space conversion, histograms, and so on. Our project
includes libraries such as Viola-Jones or Haar classifier, LBPH (Lower Binary Pattern
histogram) face recognizer, Histogram of oriented gradients (HOG).

5.3 OpenCV-Python
Python is a general purpose programming language started by Guido van Rossum, which
became very popular in short time mainly because of its simplicity and code readability. It
enables the programmer to express his ideas in fewer lines of code without reducing any
readability.

Compared to other languages like C/C++, Python is slower. But another important feature of
Python is that it can be easily extended with C/C++. This feature helps us to write
computationally intensive codes in C/C++ and create a Python wrapper for it so that we can
use these wrappers as Python modules. This gives us two advantages: first, our code is as fast
as original C/C++ code (since it is the actual C++ code working in background) and second,
it is very easy to code in Python. This is how OpenCV-Python works, it is a Python wrapper
around original C++ implementation. And the support of Numpy makes the task more
easier. Numpy is a highly optimized library for numerical operations. It gives a MATLAB-
style syntax. All the OpenCV array structures are converted to-and-from Numpy arrays. So
whatever operations you can do in Numpy, you can combine it with OpenCV, which
increases number of weapons in your arsenal. Besides that, several other libraries like SciPy,
Matplotlib which supports Numpy can be used with this. So OpenCV-Python is an
appropriate tool for fast prototyping of computer vision problems.
OpenCV-Python working
OpenCV introduces a new set of tutorials which will guide you through various functions
available in OpenCV-Python. This guide is mainly focused on OpenCV 3.x
version (although most of the tutorials will work with OpenCV 2.x also).

A prior knowledge on Python and Numpy is required before starting because they
won‘t be covered in this guide. Especially, a good knowledge on Numpy is must to write
optimized codes in OpenCV-Python.

This tutorial has been started by Abid Rahman K. as part of Google Summer of Code 2013
program, under the guidance of Alexander Mordvintsev.

OpenCV Needs us..


Since OpenCV is an open source initiative, all are welcome to make contributions to
this library. And it is same for this tutorial also. So, if you find any mistake in this tutorial
(whether it be a small spelling mistake or a big error in code or concepts, whatever), feel free
to correct it. And that will be a good task for freshers who begin to contribute to open source
projects. Just fork the OpenCV in github, make necessary corrections and send a pull request
to OpenCV. OpenCV developers will check your pull request, give you important feedback
and once it passes the approval of the reviewer, it will be merged to OpenCV. Then you
become a open source contributor. Similar is the case with other tutorials, documentation etc.

As new modules are added to OpenCV-Python, this tutorial will have to be expanded. So
those who knows about particular algorithm can write up a tutorial which includes a basic
theory of the algorithm and a code showing basic usage of the algorithm and submit it to
OpenCV.

Getting Started with Images


Goals

 Here, you will learn how to read an image, how to display it and how to save it back
 You will learn these functions : cv2.imread(), cv2.imshow() , cv2.imwrite()
 Optionally, you will learn how to display images with Matplotlib

Using OpenCV
Read an image
he function cv2.imread() to read an image. The image should be in the working directory or a full path of image should be g
Second argument is a flag which specifies the way image should be read.

 cv2.IMREAD_COLOR : Loads a color image. Any transparency of image will be


neglected. It is the default flag.
 cv2.IMREAD_GRAYSCALE : Loads image in grayscale mode
 cv2.IMREAD_UNCHANGED : Loads image as such including alpha channel

Display an image
Use the function cv2.imshow() to display an image in a window. The window automatically
fits to the image size.

First argument is a window name which is a string. second argument is our image. You can
create as many windows as you wish, but with different window ncv2.waitKey() is a
keyboard binding function. Its argument is the time in milliseconds. The function waits for
specified milliseconds for any keyboard event. If you press any key in that time, the program
continues. If 0 is passed, it waits indefinitely for a key stroke. It can also be set to detect
specific key strokes like, if key a is pressed etc which we will discuss below.

cv2.destroyAllWindows() simply destroys all the windows we created. If you want to


destroy any specific window, use the function cv2.destroyWindow() where you pass the
exact window name as the argument.

5.4 Image processing module

Purpose of Image processing

The purpose of image processing is divided into 5 groups. They are:


1. Visualization- Observe the objects that are not visible.
2. Image sharpening and restoration- To create a better image.
3. Image retrieval- Seek for the image of interest.
4. Measurement of pattern– Measures various objects in an image.
5. Image Recognition– Distinguish the objects in an image.
Modules Description:
Image acquisition: The diseased leaf image is acquired using the camera; the image is
acquired from a certain uniform distance with sufficient lighting for learning and
classification. The sample images of the diseased leaves are collected and are used in training
the system. To train and to test the system, diseased leaf images and fewer healthy images are
taken. The images will be stored in some standard format. The image background should
provide a proper contrast to the leaf color. Leaf disease dataset is prepared with both black
and white background, based on the comparative study black background image provides
better results and hence it is used for the disease identification leaf.

Image pre-processing: Image acquired using the digital camera is pre-processed using the
noise removal with averaging filter, color transformation and histogram equalization. The
color transformation step converts the RGB image to HSI (Hue, Saturation and intensity)
representation as this color space is based on human perception. Hue refers to the dominant
color attribute in the same way as perceived by a human observer. Saturation refers to the
amount of brightness or white light added to the hue. Intensity refers to the amplitude of light.
After the RGB to HSI conversion, Hue part of the image is considered for the analysis as this
provides only the required information. S and I component are ignored as it does not give any
significant information.

Masking green pixels: Since most of the green colored pixels refer to the healthy leaf and it
does not add any value to the disease identification techniques, the green pixels of the leaf are
removed by a certain masking technique, this method significantly reduces processing time.
The masking of green pixels is achieved by computing the intensity value of the green pixels,
if the intensity is less than a predefined threshold value, RGB component of that particular
pixel is assigned with a value of zero. The green pixel masking is an optional step in our
disease identification technique as the diseased part of the leaf is able to be completely
isolated in the segmentation process.
Software Requirement Specification:

Open CV

OpenCV (Open Source Computer Vision) is a library of programming functions mainly


aimed at real-time computer vision. Originally developed by Intel, it was later supported by
Willow Garage then Itseez (which was later acquired by Intel). The library is crossplatform
and free for use under the open-source BSD license. OpenCV supports deep learning
frameworks TensorFlow, Torch/PyTorch and Cafe.
It has C++, Python, Java and MATLAB interfaces and supports Windows, Linux, Android
and Mac OS. OpenCV leans mostly towards real-time vision applications and takes
advantage of MMX and SSE instructions when available. A full-featured CUDA and
OpenCL interfaces are being actively developed right now. There are over 500 algorithms
and about 10 times as many functions that compose or support those algorithms. OpenCV is
written natively in C++ and has a templated interface that works seamlessly with STL
containers.
In 1999, the OpenCV project was initially an Intel Research initiative to advance
CPUintensive applications, part of a series of projects including real-time ray tracing and 3D
display walls. The main contributors to the project included a number of optimization experts
in Intel Russia, as well as Intel‘s Performance Library Team. In the early days of OpenCV,
the goals of the project were described as:

 Advance vision research by providing not only open but also optimized code for basic
vision infrastructure. No more reinventing the wheel.
 Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readily readable and transferable.
 Advance vision-based commercial applications by making portable,
performanceoptimized code available for free – with a license that did not require
code to be open or free itself.
Figure 5.3 Qt editor with Open CV

5.4.1 Structure of Open CV

Figure 5.4 Structure of Open CV

Once OpenCV is installed, the OPENCV_BUILD\install directory will be populated with


three types of files:

 Header files: These are located in the OPENCV_BUILD\install\includesubdirectory


and are used to develop new projects with OpenCV.
 Library binaries: These are static or dynamic libraries (depending on the option
selected with CMake) with the functionality of each of the OpenCV modules. They
are located in the bin subdirectory (for example, x64\mingw\bin when the GNU
compiler is used).
 Sample binaries: These are executables with examples that use the libraries. The
sources for these samples can be found in the source package.

5.4.2 General description

 Open source computer vision library in C/C++.


 Optimized and intended for real-time applications.
 OS/hardware/window-manager independent.
 Generic image/video loading, saving, and acquisition.
 Both low and high level API.
Provides interface to Intel's Integrated Performance Primitives (IPP) with processor specific
optimization (Intel processors).

5.4.3 Features

 Image data manipulation (allocation, release, copying, setting, conversion).


 Image and video I/O (file and camera based input, image/video file output).
 Matrix and vector manipulation and linear algebra routines (products, solvers,, SVD).
 Various dynamic data structures (lists, queues, sets, trees, graphs).
 Basic image processing (filtering, edge detection, corner detection, sampling and interpolation,
color conversion, morphological operations, histograms, image pyramids).
 Structural analysis (connected components, contour processing, distance transform, various
moments, template matching, Hough transform, polygonal approximation, line fitting, ellipse
fitting, Delaunay triangulation).
 Camera calibration (finding and tracking calibration patterns, calibration, fundamental matrix
estimation, homography estimation, stereo correspondence).
 Motion analysis (optical flow, motion segmentation, tracking).
 Object recognition (eigen-methods, HMM).
 Basic GUI (display image/video, keyboard and mouse handling, scroll-bars).
 Image labeling (line, conic, polygon, text drawing)

5.4.4 OpenCV modules

 cv - Main OpenCV functions.


 cvaux - Auxiliary (experimental) OpenCV functions.
 cxcore - Data structures and linear algebra support.
 highgui - GUI functions.

5.4.5 OpenCV working with video capturing

OpenCV supports capturing images from a camera or a video file (AVI).

 Initializing capture from a camera:

CvCapture* capture = cvCaptureFromCAM(0); // capture from video device #0

 Initializing capture from a file:


CvCapture* capture = cvCaptureFromAVI("infile.avi");
Capturing a frame: IplImage* img = 0; if(!cvGrabFrame(capture)){

// capture a frame printf("Could not grab a frame\n\7"); exit(0); }

img=cvRetrieveFrame(capture); // retrieve the captured frame

To obtain images from several cameras simultaneously, first grab an image from each camera.
Retrieve the captured images after the grabbing is complete.
 Releasing the capture source: cvReleaseCapture(&capture);

Open CV
It is a library of programming1functions mainly aimed at real-time1computer vision. It is
developed by Intel research center and subsequently supported by1Willow Garage and now
maintained by itseez. It is written in C++ and its primary interface is also in C++. Its
binding is in Python, Java, and Mat lab. OpenCV runs on a variety of platform i.e.

Windows, Linux, and MacOS, openBSD in desktop and Android, IOS and Blackberry in
mobile. It is used in diverse purpose for facial recognition, gesture recognition, object
identification, mobile robotics, segmentation etc. It is a combination of OpenCV C++ API
and Python language. In our project we are using OpenCV version 2 OpenCV is used to
gesture control to open a camera and capture the image. It is also used in the image to text
and voice conversion technique.

Figure 5.8: Open CV


Putty
Putty is a secluded and open-source mortal emulator, serial comfort network file
transfer application. Putty was formulated for Microsoft Windows, but it has
been ported to various other operating systems. It can link up to a serial port. It
backs up a variety of network protocols, together with SCP, SSH, Telnet, and
r
a
w

s
o
c
k
e
t

c
o
n
n
e
c
t
i
o
n
.

Figure 4.11:
Putty
SVMs: A New Generation of Learning Algorithms

Pre 1980:
 Almost all learning methods learned linear decision surfaces.
 Linear learning methods have nice theoretical properties
1980‘s
 Decision trees and NNs allowed efficient learning of non- linear decision surfaces
 Little theoretical basis and all suffer from local minima
1990‘s
 Efficient learning algorithms for non-linear functions based on computational
learning theory developed
 Nice theoretical properties.

Support Vectors

• Support vectors are the data points that lie closest to the decision surface (or
hyperplane)
• They are the data points most difficult to classify
• They have direct bearing on the optimum location of the decision surface
• We can show that the optimal hyperplane stems from the function class with the
lowest ―capacity‖= # of independent features/parameters we can twiddle [note this is
‗extra‘ material not covered in the lectures… you don‘t have to know this]
Support Vector Machine (SVM)
Support
SVMs• maximize the margin
(Winston terminology: the
‘street’) around the separating hyperplane.

The decision function •is fully


specified by a (usually very small) subset of training samples, the support vectors.
Maximize
margin

This becomes a Quadratic

General input/output for SVMs just like for neural nets, but for one important addition…
Input: set of (input, output) training pair samples; call the input sample features x1, x2…
xn, and the output result y.
Typically, there can be lots of input features xi.
Output: set of weights w (or wi), one for each feature, whose linear combination predicts
the value of y. (So far, just like neural nets…)
Important difference: we use the optimization of maximizing the margin (‗street width‘) to
reduce the number of weights that are nonzero to just a few that correspond to the
important features that ‗matter‘ in deciding the separating line(hyperplane)…these nonzero
weights correspond to the support vectors (because they ‗support‘ the separating
hyperplane)

Which Hyperplane to pick?

• Lots of possible solutions for a,b,c.


• Some methods find a separating hyperplane, but not the optimal one (e.g., neural
net)
• But: Which points should influence optimality?
– All points?
• Linear regression
• Neural nets
– Or only ―difficult points‖ close to decision boundary
• Support vector machines

Support Vectors again for linearly separable case

• Support vectors are the elements of the training set that would change the position
of the dividing hyperplane if removed.
• Support vectors are the critical elements of the training set
• The problem of finding the optimal hyper plane is an optimization problem and can
be solved by optimization techniques (we use Lagrange multipliers to get this problem into
a form that can be solved analytically).

Support Vectors: Input vectors that just touch the boundary of the
margin (street) – circled below, there are 3 of them (or, rather, the

w Tx + b =1 or w Tx + b = –1
d
X X

X X

X
X

Here, we have shown the actual support vectors, v1, v2, v3, instead
of just the 3 circled points at the tail ends of the support vectors. d
denotes 1/2 of the street ‘width’

d X X
v1
v2

X X
v3

X
X
Defining the separating Hyperplane

• Form of equation defining the decision surface


separating the classes is a hyperplane of theform:
wTx + b = 0
– w is a weightvector
– x is input vector
– b isbias
• Allows us towrite
wTx + b 0 for di =+1

wTx + b < 0 for di =–1

Some final definitions

• Margin of Separation (d): the separation between the


hyperplane and the closest data point for a given
weight vector w and biasb.
• Optimal Hyperplane (maximal margin): the
particular hyperplane for which the margin of
separation d is maximized.
Maximizing the margin (aka street width)
We want a classifier (linear
separator) with as big a margin as H1
H0 H2

d+
Recall the distance from a point(x0,y0) to a line:
d-
Ax+By+c = 0 is: |Ax0 +By0 +c|/sqrt(A2+B2), so, The distance between H 0 and H1 is then:

The total distance between H1 and H2 is thus:

In order to maximize the margin, we thus need to minimize ||w||. With


the
condition that there are no datapoints between H1and H2:

We now must solve a


quadratic programming
problem
• Problem is: minimize||w||, s.t. discrimination boundary
is obeyed, i.e., min f(x) s.t. g(x)=0, which we can
rewriteas:
min f: ½ ||w||2 (Note this is a quadratic function)
s.t. g: yi(w•xi)–b = 1 or [yi(w•xi)–b] – 1 =0

This is a constrained optimization problem


It can be solved by the Lagrangian multipler method
Because it is quadratic, the surface is a paraboloid, with just a
flatten

Example: paraboloid 2+x2+2y2 s.t. x+y=1

Flattened paraboloid f: 2x2+2y2=0 with superimposed constraint g: x +y = 1


flattened paraboloid f: 2+x2+2y2=0 with superimposed constraint g: x +y = 1; at tangent solution p,

Two constraints

1. Parallel normal constraint (=


gradientconstraint on f, g s.t. solution is a
max, or amin)
2. g(x)=0 (solution is on the constraint line aswell)

We now recast these by combining f, g as the new


Lagrangian function by introducing new
‘slack
variables’ denoted a or (more usually, denoted 
in the literature)
Redescribing these conditions

• Want to look for solution point pwhere


f(p)g(p)
g(x) 0

• Or, combining these two as the Langrangian L


& requiring derivative of L bezero:
L(x, a) f (x) ag(x)
(x, a) 0

At a solution p

• The the constraint line g and the contour lines of f


must betangent
• If they are tangent, their
gradientvectors
(perpendiculars) areparallel
• Gradient of g must be 0 – i.e., steepest ascent
& so perpendicular tof
• Gradient of f must also be in the same direction asg
How Langrangian solves constrained
optimization
L(x,a) f (x) ag(x) where
(x, a) 0
Partial derivatives wrt x recover the parallel normal constraint

L(x,a)f(x)ag(x)i
In general,
ii

In general
Gradient min of f
constraint condition

L(x,a)f(x) ag (x)afunctionofn


i ii
mvariables

n for the x ' s, m for the a. Differentiating gives n m equations, each


set to 0. The n eqns differentiated wrt each xi give the gradient conditions; the m eqns differentiated wrt each ai recover the constraints gi

In our case, f(x): ½|| w||2 ; g(x): yi(w•xi +b)–1=0 so Lagrangian is:

min L= ½|| w||2 – ai[yi(w•xi +b)–1] wrt w, b


Lagrangian Formulation
So in the SVM problem the Lagrangianis
l

 
l
2
1
min L P2 way x w
i ii
b ai
i1 i1

s.t. i, ai  0 where l is the # of training points

From the property thatl the derivatives at min


L P
we w a y x  0
 i 1
iii

L
a y  0 so
l
P

b i 1
ii

l l
waiyix i , a y 0
i i
i1 i1

What’s with this Lp business?

• This indicates that this is the primal form


of the optimizationproblem
• We will actually solve the optimizationproblem
by now solving for the dual of this original
problem
• What is this dualformulation?
subject to constraints involv

By substituting for w and b back in the original eqn we can get rid of the
dependence on w and b.
Note first that we already now have our answer for what the weights w must be: they are a linear
Most

Primal problem:
l


l
min LP1 w  a yi xii  w b
 a
2
2 i
i1 i1

s.t. i ai  0

l l

waiyixi , a y  0 i i
i1 i1

 i
1aayy
 x x 
l l
maxL(a)Di a ijij j
i1 2 i1
l

s.t. ai yi  0 &ai  0


i1

(note that we have removed the dependence on w and


The Dual problem
• Kuhn-Tucker theorem: the solution we find herewill
be the same as the solution to the originalproblem
• Q: But why are we doing this???? (why
not just solve the originalproblem????)
• Ans: Because this will let us solve the problem by
computing the just the inner products of xi, xj (which
will be very important later on when we want to
solve non-linearly separable classificationproblems)

The Dual Problem


Dual problem:

x x 
l l

maxL(a)
Di a1i1aayy
i
2 i1
ijij ij

s.t. ai yi  0 &ai  0


i1

Notice that all we have are the dot products of xi,xj


If we take the derivative wrt a and set it equal to zero, we get the following solution, so we can solve
 a y 0
i i
i 1

0 a C
Now knowing the ai we can find the
weights w for the maximal margin separating hyper

 xi i i
l
way
i1

And now, after training and finding the w by this


method, given an unknown point u measured on
f(x)wiub ( aiyix iiu)b
i 1
Remember: most of the weights wi, i.e., the a’s, will be zero Only the support vectors (on the gutte

Inner products, similarity, and SVMs


Why should inner product kernels be involved in pattern
recognition using SVMs, or at all?
– Intuition is that inner products provide some measure of
‗similarity‘
– Inner product in 2D between 2 vectors of unit length returns
the cosine of the angle between them = how ‗far apart‘ theyare
e.g. x = [1, 0]T , y = [0, 1]T
i.e. if they are parallel their inner product is 1 (completely similar)
xT y = x•y = 1
If they are perpendicular (completely unlike) their inner product is
0 (so should not contribute to the correct classifier)
xT• y = x•y = 0
Non–Linear SVMs
The idea is to gain linearly separation by mapping the data to a higher dimensionalspace
– The following set can’t be separated by a linear function, but can be separated by a quadrati

xaxb  x2ab xab


b
a

– So if we map
we gain linear

x! x2,x 
Ans: polar coordinates!
Non-linear SVM
=Radial
Radial 
=-1

=-1
=+1

r the function we want to optimize: Ld = ai – ½ai ajyiyj (xi•xj) where (xi•xj) is the dot product of the two feature vectors. If we now transform to , instead of computing
ct (xi•xj) we will have to compute ( (xi)•  (xj)). But how can we do this? This is expensive and time consuming (suppose  is a quartic polynomial… or worse, we don
Non-linear SVMs
So, the function we end up optimizing is:
Ld = ai – ½aiaj yiyjK(xi•xj),

Kernel example: The polynomial kernel


K(xi,xj) = (xi•xj + 1)p, where p is a tunable parameter
Note: Evaluating K only requires one addition and one exponentiation
more than the original dot product

Examples for Non Linear SVMs

Kx,y xy 1p

Kx,y exp xy 2 22 


Kx,ytanhxy

1st is polynomial (includes x•x as special case)


2nd is radial basis function(gaussians)
We’ve already seen such
nonlinear transforms…

What is it???

tanh(0xTxi +1)

It’s the sigmoid transform (for

Type of Support Vector Inner Product Kernel Usual inner product


Machine
Inner Product Kernels
K(x,xi), I = 1, 2, …, N

Polynomial learning (xTxi + 1)p Power p is specified a


machine priori by the user

Radial-basis function exp(1/(22)||x-xi||2) The width 2 is


(RBF) specified a priori

Two layer neural net tanh(0xTxi + 1) Actually works only for
some values of 0 and
1
Kernels generalize the notion of
‘inner product similarity’

Note that one can define kernels over more than just
vectors: strings, trees, structures, … in fact, just about
anything

A very powerful idea: used in comparing DNA, protein


structure, sentence structures, etc.
CHAPTER 6

System Design
During the detailed phase, the view of the application developed during the high level design is broken down
into modules and programs. Logic design is done for every program and then documented as program
specifications. For every program, a unit test plan is created.

Data Flow Diagram:

1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in the
system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow
and functional detail.
DFD DIAGRAM:

Sequence Diagram:

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.
AWyebRaBult

SerdFeedbeckg

Sequence dalgram
Use case Diagram:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the system
can be depicted.
Activity Diagram:

Activity diagrams are graphical representations of workflows of stepwise activities and actions with
support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams
can be used to describe the business and operational step-by-step workflows of components in a
system. An activity diagram shows the overall flow of control.
Chapter 7

Advantages and Applications

Advantages:

1. Accuracy will be high


2. Classifies the diseases
3. Remedy Selection will be easy
4. Easy For Farmers to buy a pesticide
5. Easy to Implement

Applications:

1. In Agriculture Research Organization


2. Gardens
3. Green Houses
Chapter 8

RESULTS AND DISCUSSION


Chapter 9

Testing

Testing is the process of evaluating a system or its component(s) with the intent to find whether it satisfies the
specified requirements or not. Testing is executing a system in order to identify any gaps, errors, or missing
requirements in contrary to the actual requirements.

Testing Principle

Before applying methods to design effective test cases, a software engineer must understand the basic principle
that guides software testing. All the tests should be traceable to customer requirements.

Testing Methods

There are different methods that can be used for software testing. They are,

1. Black-Box Testing
The technique of testing without having any knowledge of the interior workings of the application is
called black-box testing. The tester is oblivious to the system architecture and does not have access to
the source code. Typically, while performing a black-box test, a tester will interact with the system's user
interface by providing inputs and examining outputs without knowing how and where the inputs are
worked upon.

2. White-Box Testing
White-box testing is the detailed investigation of internal logic and structure of the code. White-box
testing is also called glass testing or open-box testing. In order to perform white-box testing on an
application, a tester needs to know the internal workings of the code. The tester needs to have a look
inside the source code and find out which unit/chunk of the code is behaving inappropriately.
Levels of Testing

There are different levels during the process of testing. Levels of testing include different methodologies that
can be used while conducting software testing. The main levels of software testing are:

 Functional Testing:

This is a type of black-box testing that is based on the specifications of the software that is to be tested.
The application is tested by providing input and then the results are examined that need to conform to
the functionality it was intended for. Functional testing of software is conducted on a complete,
integrated system to evaluate the system's compliance with its specified requirements. There are five
steps that are involved while testing an application for functionality.

 The determination of the functionality that the intended application is meant to perform.

 The creation of test data based on the specifications of the application.

 The output based on the test data and the specifications of the application.

 The writing of test scenarios and the execution of test cases.

 The comparison of actual and expected results based on the executed test cases.
 Non-functional Testing

This section is based upon testing an application from its non-functional attributes. Non-functional
testing involves testing software from the requirements which are non-functional in nature but important
such as performance, security, user interface, etc. Testing can be done in different levels of SDLC. Few
of them are

Unit Testing

Unit testing is a software development process in which the smallest testable parts of an application, called
units, are individually and independently scrutinized for proper operation. Unit testing is often automated but it
can also be done manually. The goal of unit testing is to isolate each part of the program and show that
individual parts are correct in terms of requirements and functionality. Test cases and results are shown in the
Tables.
Leaf-Disease-Detection-using Python (Open CV)

Unit Testing Benefits

 Unit testing increases confidence in changing/ maintaining code.


 Codes are more reusable.
 Development is faster.
 The cost of fixing a defect detected during unit testing is lesser in comparison to
that of defects detected at higher levels.
 Debugging is easy.
 Codes are more reliable.

Page 71
Leaf-Disease-Detection-using Python (Open CV)
Unit testing:

Sl # Test Case : - UTC-1

Name of Test: - Uploading image

Items being tested: - Tested for uploading different images

Sample Input: - Upload Sample image

Image should upload properly


Expected output: -

upload successful
Actual output: -

Remarks: - Pass.

Sl # Test Case : - UTC-2

Name of Test: - Detecting Disease

Items being tested: - Test for different Diseased images

Sample Input: - Tested for different images of paddy plant leaves and diseases.

Disease name should be displayed


Expected output: -

Actual output: - Should Display Disease name

Remarks: - Disease name displayed

Page 72
Leaf-Disease-Detection-using Python (Open CV)

Integration Testing:
Integration testing is a level of software testing where individual units are combined and
tested as a group. The purpose of this level of testing is to expose faults in the interaction
between integrated units. Test drivers and test stubs are used to assist in Integration
Testing. Integration testing is defined as the testing of combined parts of an application to
determine if they function correctly. It occurs after unit testing and before validation
testing. Integration testing can be done in two ways: Bottom-up integration testing and
Top-down integration testing.

1. Bottom-up Integration

This testing begins with unit testing, followed by tests of progressively higher-
level combinations of units called modules or builds.

2. Top-down Integration
In this testing, the highest-level modules are tested first and progressively, lower-
level modules are tested thereafter.

In a comprehensive software development environment, bottom-up testing is


usually done first, followed by top-down testing. The process concludes with multiple
tests of the complete application, preferably in scenarios designed to mimic actual
situations. Table 8.3.2 shows the test cases for integration testing and their results.

Page 73
Leaf-Disease-Detection-using Python (Open CV)

Sl # Test Case : - ITC-1

Name of Test: - Working of Choose File option

Item being tested: - User convenience to access images stored

Sample Input: - Click and select image

Should open selected image


Expected output: -

Selected image should load


Actual output: -

Remarks: - Pass.

Sl # Test Case : - ITC-2

Name of Test: - Working of Disease Detection and Displaying disease

Item being tested: - Selecting different images and verifying names of diseases

Sample Input: - Click and select image

Should show exact disease name


Expected output: -

Disease name should be displayed


Actual output: -

Remarks: - Pass.

Page 74
Leaf-Disease-Detection-using Python (Open CV)

System testing:
System testing of software or hardware is testing conducted on a complete, integrated
system to evaluate the system's compliance with its specified requirements. System
testing falls within the scope of black-box testing, and as such, should require no
knowledge of the inner design of the code or logic. System testing is important because of
the following reasons:

 System testing is the first step in the Software Development Life Cycle, where
the application is tested as a whole.

 The application is tested thoroughly to verify that it meets the functional and
technical specifications.

 The application is tested in an environment that is very close to the production


environment where the application will be deployed.

 System testing enables us to test, verify, and validate both the business
requirements as well as the application architecture.
System Testing is shown in below tables

Sl # Test Case : - STC-1

Name of Test: - System testing in various versions of OS

Item being tested: - OS compatibility.

Sample Input: - Execute the program in windows XP/ Windows-7/8

Expected output: - Performance is better in windows-7

Actual output: - Same as expected output, performance is better in windows-7

Remarks: - Pass

Page 75
Leaf-Disease-Detection-using Python (Open CV)

CHAPTER 10

CONCLUSION AND FUTURE SCOPE


This project proposed a leaf image pattern classification to identify disease in leaf
with a combination of texture and color feature extraction. Initially the farmers sends a digital
image of the diseased leaf of a plant and these images are read in python and processed
automatically based on SVM and the results were shown. The results of this project are to
find appropriate features that can identify leaf disease of certain commonly caused disease to
plants. Firstly, normal and diseased images are collected and pre-processed. Then, features of
shape, color and texture are extracted from these images. After that, these images are
classified by support vector machine classifier. A combination of several features are used to
evaluate the appropriate features to find distinctive features for identification of leaf disease.
When a single feature is used, shape feature has the lowest accuracy and texture feature has
the highest accuracy. A combination of texture and color feature extraction results a highest
classification accuracy. A combination of texture and color feature extraction with
polynomial kernel results in good classification accuracy. Based on the classified type of
disease a text message was sent to the user in the project.

Future Scope:

In this project, we demonstrated only few types of diseases which were commonly caused
and it can be extended for more disease in future. Here only disease are detectd but in future
a robot can be sent to spray the pesticides to the plants automatically without human
interaction.

Page 76
Leaf-Disease-Detection-using Python (Open CV)

REFERENCES:

1. Mrunalini R. et al., An application of K-means clustering and artificial intelligence in


pattern recognition for crop diseases ,2011.

2. S.Raj Kumar , S.Sowrirajan,‖ Automatic Leaf Disease Detection and Classification using
Hybrid Features and Supervised Classifier‖, International Journal of Advanced Research in
Electrical, Electronics and Instrumentation Engineering, vol. 5, Issue 6,2016..

3. Tatem, D. J. Rogers, and S. I. Hay, ―Global transport networks and infectious


disease spread,‖ Advances in Parasitology, vol. 62, pp. 293–343, 2006. View at Publisher ·
View at Google Scholar · View at Scopus.

4. J. R. Rohr, T. R. Raffel, J. M. Romansic, H. McCallum, and P. J. Hudson, ―Evaluating the


links between climate, disease spread, and amphibian declines,‖ Proceedings of the National
Academy of Sciences of the United States of America, vol. 105, no. 45, pp. 17436–17441,
2008. View at Publisher · View at Google Scholar · View at Scopus.

5. T. Van der Zwet, ―Present worldwide distribution of fire blight,‖ in Proceedings of the 9th
International Workshop on Fire Blight, vol. 590, Napier, New Zealand, October 2001.

6. H. Cartwright, Ed., Artificial Neural Networks, Humana Press, 2015.

7. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business
Media, New York, NY, USA, 2008. View at MathSciNet.

8. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business
Media, New York, NY, USA, 2008. View at MathSciNet.

9. S. Sankaran, A. Mishra, R. Ehsani, and C. Davis, ―A review of advanced techniques for


detecting plant diseases,‖ Computers and Electronics in Agriculture, vol. 72, no. 1, pp. 1–13,
2010. View at Publisher ·View at Google Scholar · View at Scopus.

10. P. R. Reddy, S. N. Divya, and R. Vijayalakshmi, ―Plant disease detection techniquetool


— a theoretical approach,‖ International Journal of Innovative Technology and Research, pp.
91–93, 2015. View at Google Scholar.

Page 77
Leaf-Disease-Detection-using Python (Open CV)

11. A.-K. Mahlein, T. Rumpf, P. Welke et al., ―Development of spectral indices for detecting
and identifying plant diseases,‖ Remote Sensing of Environment, vol. 128, pp. 21–30, 2013.
View at Publisher · View at Google Scholar · View at Scopus.

Page 78

You might also like