Disease Detection in Plants - Report
Disease Detection in Plants - Report
PROJECT REPORT
ON
BACHELOR OF ENGINEERING
IN
BY
CERTIFICATE
It is hereby certified that the project work entitled Leaf-Disease-Detection using Python
(Open CV)”is a bonified work carried out by I Simran M Mohan (1NH14CS727), Harshinee
S (INH14CS754)in partial fulfilment for the award of Bachelor of Engineering in COMPUTER
SCIENCE AND ENGINEERING of the New Horizon College of Engineering during the year
2019-2020. It is certified that all corrections/suggestions indicated for Internal Assessment
have been incorporated in the Report deposited in the departmental library. The project
report has been approved as it satisfies the academic requirements in respect of project
work prescribed for the said Degree.
External Viva
1. ………………………………………….. ………………………………….
2. …………………………………………… …………………………………..
ABSTRACT
The proposed system helps in identification of plant disease and provides remedies
that can be used as a defense mechanism against the disease. The database obtained from the
Internet is properly segregated and the different plant species are identified and are renamed
to form a proper database then obtain test-database which consists of various plant diseases
that are used for checking the accuracy and confidence level of the project .Then using
training data we will train our classifier and then output will be predicted with optimum
accuracy . We use Convolution Neural Network(CNN) which comprises of different layers
which are used for prediction. A prototype drone model is also designed which can be used
for live coverage of large agricultural fields to which a high resolution camera is attached and
will capture images of the plants which will act as input for the software, based of which the
software will tell us whether the plant is healthy or not. With our code and training model we
have achieved an accuracy level of 78% .Our software gives us the name of the plant species
with its confidence level and also the remedy that can be taken as a cure.
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task would be
impossible without the mention of the people who made it possible, whose constant guidance
and encouragement crowned our efforts with success. I have great pleasure in expressing my
deep sense of gratitude to Dr. Mohan Manghnani, Chairman of New Horizon Educational
Institutions for providing necessary infrastructure and creating good environment. I take this
opportunity to express my profound gratitude to Dr. Manjunatha, Principal NHCE, for his
constant support and encouragement. I am grateful to Dr. Prashanth C.S.R, Dean Academics,
for his unfailing encouragement and suggestions, given to me in the course of my project
work. I would also like to thank Dr. B. Rajalakshmi, Professor and Head, Department of
Computer Science and Engineering, for her constant support. I express my gratitude to Dr.
Pamela Vinita Eric, Senior Assistant Professor, my project guide, for constantly monitoring
the development of the project and setting up precise deadlines. Her valuable suggestions
were the motivating factors in completing the work. Finally a note of thanks to the teaching
and non-teaching staff of Dept of Computer Science and Engineering, for their cooperation
extended to me, and my friends, who helped me directly or indirectly in the course of the
project work.
Harshinee S (1NH14CS754)
CONTENTS
ABSTRACT I
ACKNOWLEDGEMENT II
LIST OF FIGURES V
1. INTRODUCTION
DOMAIN INTRODUCTION 1
PROBLEM DEFINITION 2
OBJECTIVES 2
SCOPE OF THE PROJECT 2
2. LITERATURE SURVEY
TECHNOLOGY 4
EXISTING SYSTEM 7
PROPOSED SYSTEM 8
METHODOLOGY 9
MODULES 9
PRODUCT PERSPECTIVE 10
DESIGN DESCRIPTION 10
DESIGN APPROACH 10
3. REQUIREMENT ANALYSIS
FUNCTIONAL REQUIREMENTS 12
NON FUNCTIONAL REQUIREMENTS 12
DOMAIN AND UI REQUIREMENTS 13
HARDWARE REQUIREMENTS 14
SOFTWARE REQUIREMENTS 14
DATA REQUIREMENTS 15
4. DESIGN
DESIGN GOALS 16
OVERALL SYSTEM ARCHITECTURE 17
DATA FLOW DIAGRAM 17
STATE MACHINE UML DIAGRAM 20
SEQUENCE DIAGRAM 21
INTERACTION OVERVIEW DIAGRAM 22
USECASE DIAGRAM 23
ALGORITHM/PSEUDOCODE 24
5. IMPLEMENTATION 34
6. TESTING
TEST STRATEGY 35
PERFORMANCE CRITERIA 36
RISK IDENTIFICATION AND CONTIGENCY PLANNING 37
TEST SCHEDULE 38
ACCEPTANCE CRITERIA 40
7. EXPECTED OUTPUT 42
8. CONCLUSION 49
9. FURTURE ENHANCEMENTS 50
REFERENCES 51
LIST OF FIGURES
7.4 Leaf-Disease-Detection 44
CHAPTER 1
INTRODUCTION
The primary occupation in India is agriculture. India ranks second in the agricultural output
worldwide. Here in India, farmers cultivate a great diversity of crops. Various factors such as
climatic conditions, soil conditions, various disease, etc affect the production of the crops.
The existing method for plants disease detection is simply naked eye observation which
requires more man labor, properly equipped laboratories, expensive devices ,etc. And
improper disease detection may led to inexperienced pesticide usage that can cause
development of long term resistance of the pathogens, reducing the ability of the crop to fight
back. The plant disease detection can be done by observing the spot on the leaves of the
affected plant. The method we are adopting to detect plant diseases is image processing using
Convolution neural network(CNN). The first implementation of the plant disease detection
using image processing was done by Shen WeizhegWuyachun Chen Zhanliang and Wi
Hangda in their paper[1].
1.1 INTRODUCTION
The human visual system has no problem interpreting the subtle variations in translucency
and shading in this Figure 1.1 photograph and correctly segmenting the object from its
background.
1.2 Background
Since recent decades, digital image processing, image analysis and machine vision
have been sharply developed, and they have become a very important part of artificial
intelligence and the interface between human and machine grounded theory and applied
technology. These technologies have been applied widely in industry and medicine, but rarely
in realm related to agriculture or natural habitats.
Despite the importance of the subject of identifying plant diseases using digital image
processing, and although this has been studied for at least 30 years, the advances achieved
seem to be a little timid. Some facts lead to this conclusion:
Methods are too specific. The ideal method would be able to identify any kind of
plant. Evidently, this is unfeasible given the current technological level. However, many of
the methods that are being proposed not only are able to deal with only one species of plant,
but those plants need to be at a certain growth stage in order to the algorithm to be effective.
That is acceptable if the plant is in that specific stage, but it is very limiting otherwise. Many
of the researchers do not state this kind of information explicitly, but if their training and test
sets include only images of a certain growth stage, which is often the case, the validity of the
results cannot be extended to other stages.
Operation conditions are too strict. Many images used to develop new methods are
collected under very strict conditions of lighting, angle of capture, distance between object
and capture device, among others. This is a common practice and is perfectly acceptable in
the early stages of research. However, in most real world applications, those conditions are
almost impossible to be enforced, especially if the analysis is expected to be carried out in a
non-destructive way. Thus, it is a problem that many studies never get to the point of testing
and upgrading the method to deal with more realistic conditions, because this limits their
scope greatly. Lack of technical knowledge about more sophisticated technical tools. The
simplest solution for a problem is usually the preferable one. In the case of image processing,
some problems can be solved by using only morphological mathematical operations, which
are easy to implement and understand. However, more complex problems often demand more
sophisticated approaches. Techniques like neural networks, genetic algorithms and support
vector machines can be very powerful if properly applied. Unfortunately, that is often not the
case. In many cases, it seems that the use of those techniques is in more demand in the
scientific community than in their technical appropriateness with respect to the problem at
hand. As a result, problems like over fitting, overtraining, undersized sample sets, sample sets
with low representativeness, bias, among others, seem to be a widespread plague. Those
problems, although easily identifiable by a knowledgeable individual on the topic, seem to go
widely overlooked by the authors, probably due to the lack of knowledge about the tools they
are employing. The result is a whole group of technically flawed solutions.
One of the most common methods in leaf feature extraction is based on morphological
features of leaf. Some simple geometrical features are aspect ratio, rectangularity, convexity,
sphericity, form factor etc.
One can easily transfer the leaf image to a computer and a computer can extract
features automatically in image processing techniques. Some systems employ descriptions
used by botanists. But it is not easy to extract and transfer those features to a computer
automatically.
The aim of the project is to develop a Leaf recognition program based on specific
characteristics extracted from photography. Hence this presents an approach where the plant
is identified based on its leaf features such as area, histogram equalization and edge detection
and classification. The main purpose of this program is to use Open-CV resources.
Indeed, there are several advantages of combining Open-CV with the leaf recognition
program. The result proves this method to be a simple and an efficient attempt. Future
sections will discuss more on image preprocessing and acquisition which includes the image
preprocessing and enhancement, histogram equalization, edge detection. Further on sections
introduces texture analysis and high frequency feature extraction of a leaf images to classify
leaf images i.e. parametric calculations and then followed by results.
1.3 Motivation
Here is the brief review of the papers which we have referred for this project. Since digital
image processing is used in this project to detect diseases in plants, it eliminates the
traditional methods which are used in olden days and also it removes human error. This
method needs a digital computer, mat lab software and a digital camera to detect diseases in
plants. So it is a suitable method to adapt for this project. In the paper by Pallavi S. Marathe,
different steps like Image acquisition, Pre processing which includes clipping, smoothing and
Contrast enhancement. She has also used Segmentation techniques to partition different parts
in an image. Disease detection is done by extracting features and classifying using SVM
algorithm.
1.4 Objectives
Using new Different technologies and method we can make more faster and efficient
application for user. The system presented in this project was able to perform accurately,
however there are still a number of issues which need to be addressed. First of all, we
consider only four diseases in this project therefore the scope of disease detection is limited.
In order to increase the scope of the disease detection large datasets of different disease
should be use.
CHAPTER 2
LITERATURE SURVEY
Earlier papers are describing to detect mainly pests like aphids, whiteflies, thrips, etc.
using various approaches suggesting the various implementation ways as illustrated and
discussed below. Proposed a cognitive vision system that combines image processing,
learning and knowledge-based techniques. They only detect mature stage of white fly and
count the number of flies on single leaflet. They used 180 images as test dataset .among this
images they tested 162 images and each image having 0 to 5 whitefly pest. They calculate
false negative rate (FNR) and false positive rate (FPR) for test images with no whiteflies
(class 1), at least one white fly (class 2) and for whole test set. Extend implementation of the
image processing algorithms and techniques to detect pests in controlled environment like
greenhouse. Three kinds of typical features including size, morphological feature (shape of
boundary), and color components were considered and investigated to identify the three kinds
of adult insects, whiteflies, aphids and thrips. Promote early pest detection in green houses
based on video analysis. Their goal was to define a decision support system which handles a
video camera data. They implemented algorithms for detection of only two bio-aggressors
name as white flies and aphids. The system was able to detect low infestation stages by
detecting eggs of white flies thus analyzing behavior of white flies. Proposed pest detection
system including four steps name as color conversion, segmentation, reduction in noise and
counting whiteflies. A distinct algorithm name as relative difference in pixel intensities (RDI)
was proposed for detecting pest named as white fly affecting various leaves. The algorithm
not only works for greenhouse based crops but also agricultural based crops as well. The
algorithm was tested over 100 images of white fly pest with an accuracy of 96%. Proposed a
new method of pest detection and positioning based on binocular stereo to get the location
information of pest, which was used for guiding the robot to spray the pesticides
automatically. Introduced contextual parameter tuning for adaptive image segmentation that
allows to efficiently tune algorithm parameters with respect to variations in leaf color and
contrast. Presents an automatic method for classification of the main agents that cause
damages to soybean leaflets, i.e., beetles and caterpillars using SVM classifier.
2.2 Early Detection of Pests on Leaves Using Support Vector Machine:
This project deals with a new type of early detection of pests system. Images of the
leaves affected by pests are acquired by using a digital camera. The leaves with pest images
are processed for getting a gray colored image and then using image segmentation, image
classification techniques to detect pests on leaves. The image is transferred to the analysis
algorithm to report the quality. The technique evolved in this system is both image processing
and soft computing. The image processing technique is used to detect the pests and soft
computing technique is used for doing this detection over a wide population. The images are
acquired by using a digital camera of approximately 12 M-Pixel resolution in 24-bits color
resolution. The images are then transferred to a PC and represented in Open-CV software.
The RGB image is then segmented using blob like algorithm for segmentation of pests on
leaves. The segmented leave part is now analyzed for estimating pest density in field. The
Support Vector Machine classifier is used to classify the pest types. It is also implemented in
FPGA kit by converting the Open-CV coding into HDL coder. In FPGA, the input image is
downloaded to the memory. It reads the image from memory, process it and display the
output image on monitor.
A software routine was written in Open-CV. In which training and testing performed
via several neural network classifier. Texture Feature Classification Methods are as follows.
K-nearest neighbor classifier is used to calculate the minimum distance between the
given point and other points to determine the given point belongs to which class. Goal is to
computes the distance from the query sample to every training sample and selects the
neighbor that is having minimum distance.
A radial basis function (RBF) is a real-valued function whose value depends only on the
distance from the origin. The normally used measuring norm is Euclidean distance. RBF‘s
are the networks where the activation of hidden units is based on the distance between the
input vector and a prototype vector.
2.2.3. Artificial neural networks:
ANNs are popular machine learning algorithms that are in a wide use in recent years.
Multilayer Perception (MLP) is the basic form of ANN that updates the weights through back
propagation during the training [16]. There are other variations in neural networks, which are
recently, became popular in texture classification Probabilistic Neural Network (PNN): It is
derived from Radial Basis Function (RBF) network and it has parallel distributed processor
that has a natural tendency for storing experiential knowledge. PNN is an implementation of a
statistical algorithm called kernel discriminate analysis in which the operations are organized
into a multilayered feed forward network having four layers viz. input layer, pattern layer,
summation layer, and output layer.
A typical BP network consists of three parts: input layer, hidden layer and output
layer. Three parts in turn connect through the collection weight value between nodes. The
largest characteristic of BP network is that network weight value reach expectations through
the sum of error squares between the network output and the sample output, and then it
continuously adjusted network structure's weight value. It is popular and extensively used for
training feed forward networks. Also it has no inherent novelty detection, so it must be
trained on known outcomes for training feed forward networks.
The first step in the proposed approach is to capture the sample from the digital
camera and extract the features. The sample is captured from the digital camera and the
features are then stored in the database.
Preprocessing images is used to removing low-frequency background noise.
Normalizing the intensity of the individual particles of images. It enhances the visual
appearance of images. Improve the manipulation of datasets. It is the technique of enhancing
data images prior to computational processing. The caution is enhancement techniques can
emphasize image artifacts, or even lead to a loss of information if not correctly used. The
steps involved in preprocessing are to get an input image and then the image has to be
enhanced. Then the RGB image is converted to an gray scale image to get an clear
identification of pests on leaves. Noise removal function can be performed by using filtering
techniques. Mean filtering: The 3x3 sub-region is scanned over the entire image. At each
position the center pixel is replaced by the average value. Median filtering: The 3x3 sub-
region is scanned over the entire image. At each position the center pixel is replaced by the
median value.
The PSNR value is calculated for both the mean and median filter. Based on the
PSNR value one of the filtering images is taken for a further process. For mean filtering, the
PSNR value is 23.78 and the PSNR value for median filtering is 12.89. The higher the PSNR,
the better the quality of the compressed or reconstructed image. Therefore the mean filtering
is taken for the further process.
Image features usually include color, shape and texture features. Feature extraction is
performed related to the Majority Based Voting method there are 3steps involved: 1)
Histogram Oriented Gradient (HOG), 2) Gaussian Mixture Model (GMM), 3) Gabor Feature.
HOG is the feature descriptors used for the purpose of object detection. Gaussian mixture
model is used for the texture analysis. Gabor Feature is calculating the relationship between
groups of two pixels in the original image. In this proposed work, the image can be sub
divided into small block. Then in each block the three steps are involved. HOG is used for
detecting the distribution of color ratio in an image. GMM used for the detection of shape of
pests present in an image. Gabor feature can be used to find the orientation of pests. Finally,
the feature values are fed as input to the classifiers.
There are 3types of classifier are used to which classifier gives the better result. The
back propagation and feed forward classifiers are not detecting a some pests in an image. But
SVM gives better result. SVM is a non-linear classifier, and is a newer trend in machine
learning algorithm. SVM is popularly used in many pattern recognition problems including
texture classification. SVM is designed to work with only two classes. This is done by
maximizing the margin from the hyper plane. The samples closest to the margin that were
selected to determine the hyper plane is known as support vectors [12]. Multiclass
classification is applicable and basically built up by various two class SVMs to solve the
problem, either by using one-versus-all or one. Another feature is the kernel function that
projects the non-linearly separable data from low-dimensional space to a space of higher
dimension so that they may become separable in the higher dimensional space too. It is used
to detect the pest on leaves and also gives information about a type of pests. It gives a result
of number of pests are presented. Then, it gives a remedy to take over for controlling a pest.
Finally, the feature values are fed as input to the Support Vector Machine classifier, allow us
to accurately distinguish the pests and leaves. This is an important step towards the
identification of pests and to take the corresponding remedies.
This paper describes Support Vector Machine (SVM) and Artificial Neural Network (ANN)
based recognition and classification of visual symptoms affected by fungal disease. Color
images of fungal disease symptoms affected on cereals like wheat, maize and jowar are used
in this work. Different types of symptoms affected by fungal disease namely leaf blight, leaf
spot, powdery mildew, leaf rust, smut are considered for the study. The developed algorithms
are used to preprocess, segment, extract features from disease affected regions. The affected
regions are segmented using k-means segmentation technique. Color texture features are
extracted from affected regions and then used as inputs to SVM and ANN classifiers. The
texture analysis is done using Color Co-occurrence Matrix. Tests are performed to classify
image samples. Classification accuracies between 68.5% and 87% are obtained using ANN
classifier. The average classification accuracies have increased to 77.5% and 91.16% using
SVM classifier.
This work implements a machine vision system for the classification of the visual
symptoms of fungal disease. In the present work, tasks like image acquisition, segmentation,
feature extraction and classification are carried out.The classification tree are shown in Figure
2.
System Configuration:
HARDWARE:
SOFTWARE:
We can reduce the attack of pests by using proper pesticides and remedies .We can reduce
the size of the images by proper size reduction techniques and see to it that the quality is not
compromised to a great extent. We can expand the projects of the earlier mentioned authors
such that the remedy to the disease is also shown by the system . The main objective is to
identify the plant diseases using image processing. It also, after identification of the disease,
suggest the name of pesticide to be used. It also identifies the insects and pests responsible
for epidemic. Apart from these parallel objectives, this drone is very time saving. The budget
of the model is quite high for low scale farming purposes but will be value for money in large
scale farming. It completes each of the process sequentially and hence achieving each of the
output.
Thus the main objectives are:
1) To design such system that can detect crop disease and pest accurately.
2) Create database of insecticides for respective pest and disease.
3) To provide remedy for the disease that is detected.
Leaf miners are the insect family at larval stage.Theyfeed between upper and lower part of
the leaf.
Due to insect on very much amount in plant, it is severely damaged. On a single leaf the
number of maggots can be six. Therefore, it can severely damage the leaf of plant. It can
restrict plantgrowth, leads to reduced yields.
Hence we can develop a robot, using image processing to detect the disease, to classify
it.This will avoid human interference and hence lead to précised unprejudiced decision.
Generally, whatever our observation about the disease is just used for the decision of the
disease. A symptom of plant disease is a visible effect of disease on the plant. Symptoms can
be change in color, change in the shape or functional changes of the plant as per its response
to the pathogens, insects etc. Leaf wilting is a characteristic symptom of verticilium wilt. It is
caused due to the fungal plant pathogens V. dahliaeandVerticilliumalbo-atrum. General
common bacterial disease symptoms are brown, necrotic lesions which gets surrounded by
abright light yellow halo at the edge of the leaf of the plant or at innerpart of the leaf on the
bean plants. You are not actually seeing the disease pathogen, but rather a symptom that is
being caused by the pathogen.
In order to build a machine leaning model it consists of two phase namely testing and training
phase were the model is first trained and an input is given to test the model which is called
the test data. The model consists of several image processing steps such as image acquisition,
image pre-processing,segmentation, feature extraction and SVM classifier to classify the
diseases. Image acquisition:The diseased leaf image is acquired using the camera, the image
is acquired from a certain uniform distance with sufficient lighting for learning and
classification. The sample images of the diseased leaves are collected and are used in training
the system. To train and to test the system, diseased leaf images and fewer healthy images are
taken. The images will be stored in some standard format.The image background should
provide a proper contrast to the leaf color. Leaf disease dataset is prepared with both black
and white background, based on the comparative study black background image provides
better results and hence it is used for the disease identification leaf.
Image pre-processing: Image acquired using the digital camera is pre-processed using the
noise removal with averaging filter, color transformation and histogram equalization. The
color transformation step converts the RGB image to HSI (Hue, Saturation and intensity)
representation as this color space is based on human perception. Hue refers to the dominant
color attribute in the same way as perceived by a human observer. Saturation refers to the
amount of brightness or white light added to the hue. Intensity refers to the amplitude of light.
After the RGB to HSI conversion, Hue part of the image is considered for the analysis as this
provides only the required information. S and I component are ignored as it does not give any
significant information.
Masking green pixels: Since most of the green colored pixels refer to the healthy leaf and it
does not add any value to the disease identification techniques, the green pixels of the leaf are
removed by a certain masking technique, this method significantly reduces processing time.
The masking of green pixels is achieved by computing the intensity value of the green pixels,
if the intensity is less than a predefined threshold value, RGB component of that particular
pixel is assigned with a value of zero. The green pixel masking is an optional step in our
disease identification technique as the diseased part of the leaf is able to be completely
isolated in the segmentation process.
Segmentation: There are different image segmentation techniques like threshold based, edge
based, cluster based and neural network based. One of the most efficient methods is the
clustering method which again has multiple subtypes, kmeans clustering, Fuzzy C-means
clustering, subtractive clustering method etc. One of most used clustering algorithm is k-
means clustering. K-means clustering is simple and computationally faster than other
clustering techniques and it also works for large number of variables. But it produces
different cluster result for different number of number of cluster and different initial centroid
values. So it is required to initialize the proper number of number of cluster k and proper
initial centroid.K-means is an general purpose methods that is being used at many domains to
different problems.
In this project, k-means is a clustering method used to get the clusters of k numbers which
matches the specified characters like to segment the leaf.
Fig. K-means algorithm
Flow Diagram
ALGORITHM
3. Convert color values from RGB to the space specified in that structure.
6. Eliminate the masked cells present inside the edges of the infected cluster.
Disease detection by using k clustering method [2].The algorithm provides the necessary
steps required for the image detection of the plant leaf.In the first step, generally the RGB
images of all the leaves are capturedusing camera. In step 2 a color transformation structureis
formed, and thencolor space transformationis applied in step 3.These two steps are to be
expectedin order to perform step 4. In this step the images which we have got are processed
for segmentation by using the K-Means clustering technique [2]. These four steps comeunder
phase one, the infectedobjectsdetected and determined.
In step 5, the green pixels are detected. Then masking of green pixels is done as: if the green
color value of pixel is less thanthe threshold value which we alreadyhave calculated, then the
red, green and bluecomponents values of the these pixel aremade zero. This is done because
these are the unaffected part. That is why there values are made zero which results in
reduction in calculations as well.Additionally, the time consumed by the raspberry pi3 for
showing the final output will greatly minimized.
In step 6 the pixels having zero value for red, green and blue andthe pixels on the edge of the
infected clustersare removed completely. Phase 2contains step five and step number six and
thisphase gives addedclarity in the classifying of that disease. This results with good
detection and performance, also generally required computing time should bedecreased to its
minimum value.
In step number seven, the infected cluster is converted fromRGB form to HSI format.After
that , the SGDMmatrices arecreated for every pixel of the image. But this is done for only for
H and S images and not for the I images. The SGDM [1] actually measures theprobability that a
given pixel at one particular gray level willoccur at a different distance and angle of orientation from
otherpixel, however pixel has a second particular gray level for it. Fromthe SGDM matrices,
generation of texture statistics for each and every image is done.
Concisely, the features are calculated for the pixelspresent inside the edge of the infected part
of the leaf. That means, the part which is not affected inside the boundary of infected partgets
uninvolved. Steps seven to tencome under phase three. In this phasethe features related to
texture for the objects being segmented are computed.
Finally, the recognition process in the fourth phasewas performed. For each image we have
captured the steps in the algorithm are repeated each time. After this the result are transferred
to GSM module. Using Raspberry Pi the result is sent as e -mail, and also is displayed on
monitor.
Feature Extraction:From the input images, the features are to be extracted. To do so instead
of choosing the whole set of pixels we can choose only which are necessary and sufficient to
describe the whole of the segment. The segmented image is first selected by manual
interference. The affected area of the image can be found from calculating the area
connecting the components. First, the connected components with 6 neighborhood pixels are
found. Later the basic region properties of the input binary image are found. The interest here
is only with the area. The affected area is found out. The percent area covered in this segment
says about the quality of the result. The histogram of an entity or image provides information
about the frequency of occurrence of certain value in the whole of the data/image. It is an
important tool for frequency analysis. The co-occurrence takes this analysis to next level
wherein the intensity occurrences of two pixels together are noted in the matrix, making the
co-occurrence a tremendous tool for analysis. From gray-co-matrix, the features such as
Contrast, Correlation, Energy, Homogeneity' are extracted. The following table lists the
formulas of the features.
Classification using SVM: A support vector machine comes under supervised learning
model in the machine learning. SVM‘s are mainly used for classification and regression
analysis. SVM has to be associated with learning algorithm to produce an output. SVM has
given better performance for classifications and regressions as compare to other processes.
There are sets of training which belong to two different categories. The SVM training
algorithm creates a model that allots new examples into one category or into the other
category, which makes it non-probabilistic binary linear classifier. The representation in
SVM shows points in space and also they are mapped so the examples come across as they
have been divide by a gap which is as wide as possible.
Detailed Explanation:
Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s
elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas on most platforms.
The Python interpreter is easily extended with new functions and data types implemented in
C or C++ (or other languages callable from C). Python is also suitable as an extension
language for customizable applications.
5.2 OpenCV
5.3 OpenCV-Python
Python is a general purpose programming language started by Guido van Rossum, which
became very popular in short time mainly because of its simplicity and code readability. It
enables the programmer to express his ideas in fewer lines of code without reducing any
readability.
Compared to other languages like C/C++, Python is slower. But another important feature of
Python is that it can be easily extended with C/C++. This feature helps us to write
computationally intensive codes in C/C++ and create a Python wrapper for it so that we can
use these wrappers as Python modules. This gives us two advantages: first, our code is as fast
as original C/C++ code (since it is the actual C++ code working in background) and second,
it is very easy to code in Python. This is how OpenCV-Python works, it is a Python wrapper
around original C++ implementation. And the support of Numpy makes the task more
easier. Numpy is a highly optimized library for numerical operations. It gives a MATLAB-
style syntax. All the OpenCV array structures are converted to-and-from Numpy arrays. So
whatever operations you can do in Numpy, you can combine it with OpenCV, which
increases number of weapons in your arsenal. Besides that, several other libraries like SciPy,
Matplotlib which supports Numpy can be used with this. So OpenCV-Python is an
appropriate tool for fast prototyping of computer vision problems.
OpenCV-Python working
OpenCV introduces a new set of tutorials which will guide you through various functions
available in OpenCV-Python. This guide is mainly focused on OpenCV 3.x
version (although most of the tutorials will work with OpenCV 2.x also).
A prior knowledge on Python and Numpy is required before starting because they
won‘t be covered in this guide. Especially, a good knowledge on Numpy is must to write
optimized codes in OpenCV-Python.
This tutorial has been started by Abid Rahman K. as part of Google Summer of Code 2013
program, under the guidance of Alexander Mordvintsev.
As new modules are added to OpenCV-Python, this tutorial will have to be expanded. So
those who knows about particular algorithm can write up a tutorial which includes a basic
theory of the algorithm and a code showing basic usage of the algorithm and submit it to
OpenCV.
Here, you will learn how to read an image, how to display it and how to save it back
You will learn these functions : cv2.imread(), cv2.imshow() , cv2.imwrite()
Optionally, you will learn how to display images with Matplotlib
Using OpenCV
Read an image
he function cv2.imread() to read an image. The image should be in the working directory or a full path of image should be g
Second argument is a flag which specifies the way image should be read.
Display an image
Use the function cv2.imshow() to display an image in a window. The window automatically
fits to the image size.
First argument is a window name which is a string. second argument is our image. You can
create as many windows as you wish, but with different window ncv2.waitKey() is a
keyboard binding function. Its argument is the time in milliseconds. The function waits for
specified milliseconds for any keyboard event. If you press any key in that time, the program
continues. If 0 is passed, it waits indefinitely for a key stroke. It can also be set to detect
specific key strokes like, if key a is pressed etc which we will discuss below.
Image pre-processing: Image acquired using the digital camera is pre-processed using the
noise removal with averaging filter, color transformation and histogram equalization. The
color transformation step converts the RGB image to HSI (Hue, Saturation and intensity)
representation as this color space is based on human perception. Hue refers to the dominant
color attribute in the same way as perceived by a human observer. Saturation refers to the
amount of brightness or white light added to the hue. Intensity refers to the amplitude of light.
After the RGB to HSI conversion, Hue part of the image is considered for the analysis as this
provides only the required information. S and I component are ignored as it does not give any
significant information.
Masking green pixels: Since most of the green colored pixels refer to the healthy leaf and it
does not add any value to the disease identification techniques, the green pixels of the leaf are
removed by a certain masking technique, this method significantly reduces processing time.
The masking of green pixels is achieved by computing the intensity value of the green pixels,
if the intensity is less than a predefined threshold value, RGB component of that particular
pixel is assigned with a value of zero. The green pixel masking is an optional step in our
disease identification technique as the diseased part of the leaf is able to be completely
isolated in the segmentation process.
Software Requirement Specification:
Open CV
Advance vision research by providing not only open but also optimized code for basic
vision infrastructure. No more reinventing the wheel.
Disseminate vision knowledge by providing a common infrastructure that developers
could build on, so that code would be more readily readable and transferable.
Advance vision-based commercial applications by making portable,
performanceoptimized code available for free – with a license that did not require
code to be open or free itself.
Figure 5.3 Qt editor with Open CV
5.4.3 Features
To obtain images from several cameras simultaneously, first grab an image from each camera.
Retrieve the captured images after the grabbing is complete.
Releasing the capture source: cvReleaseCapture(&capture);
Open CV
It is a library of programming1functions mainly aimed at real-time1computer vision. It is
developed by Intel research center and subsequently supported by1Willow Garage and now
maintained by itseez. It is written in C++ and its primary interface is also in C++. Its
binding is in Python, Java, and Mat lab. OpenCV runs on a variety of platform i.e.
Windows, Linux, and MacOS, openBSD in desktop and Android, IOS and Blackberry in
mobile. It is used in diverse purpose for facial recognition, gesture recognition, object
identification, mobile robotics, segmentation etc. It is a combination of OpenCV C++ API
and Python language. In our project we are using OpenCV version 2 OpenCV is used to
gesture control to open a camera and capture the image. It is also used in the image to text
and voice conversion technique.
s
o
c
k
e
t
c
o
n
n
e
c
t
i
o
n
.
Figure 4.11:
Putty
SVMs: A New Generation of Learning Algorithms
Pre 1980:
Almost all learning methods learned linear decision surfaces.
Linear learning methods have nice theoretical properties
1980‘s
Decision trees and NNs allowed efficient learning of non- linear decision surfaces
Little theoretical basis and all suffer from local minima
1990‘s
Efficient learning algorithms for non-linear functions based on computational
learning theory developed
Nice theoretical properties.
Support Vectors
• Support vectors are the data points that lie closest to the decision surface (or
hyperplane)
• They are the data points most difficult to classify
• They have direct bearing on the optimum location of the decision surface
• We can show that the optimal hyperplane stems from the function class with the
lowest ―capacity‖= # of independent features/parameters we can twiddle [note this is
‗extra‘ material not covered in the lectures… you don‘t have to know this]
Support Vector Machine (SVM)
Support
SVMs• maximize the margin
(Winston terminology: the
‘street’) around the separating hyperplane.
General input/output for SVMs just like for neural nets, but for one important addition…
Input: set of (input, output) training pair samples; call the input sample features x1, x2…
xn, and the output result y.
Typically, there can be lots of input features xi.
Output: set of weights w (or wi), one for each feature, whose linear combination predicts
the value of y. (So far, just like neural nets…)
Important difference: we use the optimization of maximizing the margin (‗street width‘) to
reduce the number of weights that are nonzero to just a few that correspond to the
important features that ‗matter‘ in deciding the separating line(hyperplane)…these nonzero
weights correspond to the support vectors (because they ‗support‘ the separating
hyperplane)
• Support vectors are the elements of the training set that would change the position
of the dividing hyperplane if removed.
• Support vectors are the critical elements of the training set
• The problem of finding the optimal hyper plane is an optimization problem and can
be solved by optimization techniques (we use Lagrange multipliers to get this problem into
a form that can be solved analytically).
Support Vectors: Input vectors that just touch the boundary of the
margin (street) – circled below, there are 3 of them (or, rather, the
w Tx + b =1 or w Tx + b = –1
d
X X
X X
X
X
Here, we have shown the actual support vectors, v1, v2, v3, instead
of just the 3 circled points at the tail ends of the support vectors. d
denotes 1/2 of the street ‘width’
d X X
v1
v2
X X
v3
X
X
Defining the separating Hyperplane
d+
Recall the distance from a point(x0,y0) to a line:
d-
Ax+By+c = 0 is: |Ax0 +By0 +c|/sqrt(A2+B2), so, The distance between H 0 and H1 is then:
Two constraints
At a solution p
L(x,a)f(x)ag(x)i
In general,
ii
In general
Gradient min of f
constraint condition
In our case, f(x): ½|| w||2 ; g(x): yi(w•xi +b)–1=0 so Lagrangian is:
l
2
1
min L P2 way x w
i ii
b ai
i1 i1
L
a y 0 so
l
P
b i 1
ii
l l
waiyix i , a y 0
i i
i1 i1
By substituting for w and b back in the original eqn we can get rid of the
dependence on w and b.
Note first that we already now have our answer for what the weights w must be: they are a linear
Most
Primal problem:
l
l
min LP1 w a yi xii w b
a
2
2 i
i1 i1
s.t. i ai 0
l l
waiyixi , a y 0 i i
i1 i1
i
1aayy
x x
l l
maxL(a)Di a ijij j
i1 2 i1
l
x x
l l
maxL(a)
Di a1i1aayy
i
2 i1
ijij ij
0 a C
Now knowing the ai we can find the
weights w for the maximal margin separating hyper
xi i i
l
way
i1
– So if we map
we gain linear
x! x2,x
Ans: polar coordinates!
Non-linear SVM
=Radial
Radial
=-1
=-1
=+1
r the function we want to optimize: Ld = ai – ½ai ajyiyj (xi•xj) where (xi•xj) is the dot product of the two feature vectors. If we now transform to , instead of computing
ct (xi•xj) we will have to compute ( (xi)• (xj)). But how can we do this? This is expensive and time consuming (suppose is a quartic polynomial… or worse, we don
Non-linear SVMs
So, the function we end up optimizing is:
Ld = ai – ½aiaj yiyjK(xi•xj),
What is it???
tanh(0xTxi +1)
Two layer neural net tanh(0xTxi + 1) Actually works only for
some values of 0 and
1
Kernels generalize the notion of
‘inner product similarity’
Note that one can define kernels over more than just
vectors: strings, trees, structures, … in fact, just about
anything
System Design
During the detailed phase, the view of the application developed during the high level design is broken down
into modules and programs. Logic design is done for every program and then documented as program
specifications. For every program, a unit test plan is created.
1. The DFD is also called as bubble chart. It is a simple graphical formalism that can be used to
represent a system in terms of input data to the system, various processing carried out on this
data, and the output data is generated by this system.
2. The data flow diagram (DFD) is one of the most important modeling tools. It is used to
model the system components. These components are the system process, the data used by
the process, an external entity that interacts with the system and the information flows in the
system.
3. DFD shows how the information moves through the system and how it is modified by a
series of transformations. It is a graphical technique that depicts information flow and the
transformations that are applied as data moves from input to output.
4. DFD is also known as bubble chart. A DFD may be used to represent a system at any level of
abstraction. DFD may be partitioned into levels that represent increasing information flow
and functional detail.
DFD DIAGRAM:
Sequence Diagram:
A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a Message
Sequence Chart. Sequence diagrams are sometimes called event diagrams, event scenarios, and
timing diagrams.
AWyebRaBult
SerdFeedbeckg
Sequence dalgram
Use case Diagram:
A use case diagram in the Unified Modeling Language (UML) is a type of behavioral
diagram defined by and created from a Use-case analysis. Its purpose is to present a graphical
overview of the functionality provided by a system in terms of actors, their goals (represented as
use cases), and any dependencies between those use cases. The main purpose of a use case diagram
is to show what system functions are performed for which actor. Roles of the actors in the system
can be depicted.
Activity Diagram:
Activity diagrams are graphical representations of workflows of stepwise activities and actions with
support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams
can be used to describe the business and operational step-by-step workflows of components in a
system. An activity diagram shows the overall flow of control.
Chapter 7
Advantages:
Applications:
Testing
Testing is the process of evaluating a system or its component(s) with the intent to find whether it satisfies the
specified requirements or not. Testing is executing a system in order to identify any gaps, errors, or missing
requirements in contrary to the actual requirements.
Testing Principle
Before applying methods to design effective test cases, a software engineer must understand the basic principle
that guides software testing. All the tests should be traceable to customer requirements.
Testing Methods
There are different methods that can be used for software testing. They are,
1. Black-Box Testing
The technique of testing without having any knowledge of the interior workings of the application is
called black-box testing. The tester is oblivious to the system architecture and does not have access to
the source code. Typically, while performing a black-box test, a tester will interact with the system's user
interface by providing inputs and examining outputs without knowing how and where the inputs are
worked upon.
2. White-Box Testing
White-box testing is the detailed investigation of internal logic and structure of the code. White-box
testing is also called glass testing or open-box testing. In order to perform white-box testing on an
application, a tester needs to know the internal workings of the code. The tester needs to have a look
inside the source code and find out which unit/chunk of the code is behaving inappropriately.
Levels of Testing
There are different levels during the process of testing. Levels of testing include different methodologies that
can be used while conducting software testing. The main levels of software testing are:
Functional Testing:
This is a type of black-box testing that is based on the specifications of the software that is to be tested.
The application is tested by providing input and then the results are examined that need to conform to
the functionality it was intended for. Functional testing of software is conducted on a complete,
integrated system to evaluate the system's compliance with its specified requirements. There are five
steps that are involved while testing an application for functionality.
The determination of the functionality that the intended application is meant to perform.
The output based on the test data and the specifications of the application.
The comparison of actual and expected results based on the executed test cases.
Non-functional Testing
This section is based upon testing an application from its non-functional attributes. Non-functional
testing involves testing software from the requirements which are non-functional in nature but important
such as performance, security, user interface, etc. Testing can be done in different levels of SDLC. Few
of them are
Unit Testing
Unit testing is a software development process in which the smallest testable parts of an application, called
units, are individually and independently scrutinized for proper operation. Unit testing is often automated but it
can also be done manually. The goal of unit testing is to isolate each part of the program and show that
individual parts are correct in terms of requirements and functionality. Test cases and results are shown in the
Tables.
Leaf-Disease-Detection-using Python (Open CV)
Page 71
Leaf-Disease-Detection-using Python (Open CV)
Unit testing:
upload successful
Actual output: -
Remarks: - Pass.
Sample Input: - Tested for different images of paddy plant leaves and diseases.
Page 72
Leaf-Disease-Detection-using Python (Open CV)
Integration Testing:
Integration testing is a level of software testing where individual units are combined and
tested as a group. The purpose of this level of testing is to expose faults in the interaction
between integrated units. Test drivers and test stubs are used to assist in Integration
Testing. Integration testing is defined as the testing of combined parts of an application to
determine if they function correctly. It occurs after unit testing and before validation
testing. Integration testing can be done in two ways: Bottom-up integration testing and
Top-down integration testing.
1. Bottom-up Integration
This testing begins with unit testing, followed by tests of progressively higher-
level combinations of units called modules or builds.
2. Top-down Integration
In this testing, the highest-level modules are tested first and progressively, lower-
level modules are tested thereafter.
Page 73
Leaf-Disease-Detection-using Python (Open CV)
Remarks: - Pass.
Item being tested: - Selecting different images and verifying names of diseases
Remarks: - Pass.
Page 74
Leaf-Disease-Detection-using Python (Open CV)
System testing:
System testing of software or hardware is testing conducted on a complete, integrated
system to evaluate the system's compliance with its specified requirements. System
testing falls within the scope of black-box testing, and as such, should require no
knowledge of the inner design of the code or logic. System testing is important because of
the following reasons:
System testing is the first step in the Software Development Life Cycle, where
the application is tested as a whole.
The application is tested thoroughly to verify that it meets the functional and
technical specifications.
System testing enables us to test, verify, and validate both the business
requirements as well as the application architecture.
System Testing is shown in below tables
Remarks: - Pass
Page 75
Leaf-Disease-Detection-using Python (Open CV)
CHAPTER 10
Future Scope:
In this project, we demonstrated only few types of diseases which were commonly caused
and it can be extended for more disease in future. Here only disease are detectd but in future
a robot can be sent to spray the pesticides to the plants automatically without human
interaction.
Page 76
Leaf-Disease-Detection-using Python (Open CV)
REFERENCES:
2. S.Raj Kumar , S.Sowrirajan,‖ Automatic Leaf Disease Detection and Classification using
Hybrid Features and Supervised Classifier‖, International Journal of Advanced Research in
Electrical, Electronics and Instrumentation Engineering, vol. 5, Issue 6,2016..
5. T. Van der Zwet, ―Present worldwide distribution of fire blight,‖ in Proceedings of the 9th
International Workshop on Fire Blight, vol. 590, Napier, New Zealand, October 2001.
7. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business
Media, New York, NY, USA, 2008. View at MathSciNet.
8. Steinwart and A. Christmann, Support Vector Machines, Springer Science & Business
Media, New York, NY, USA, 2008. View at MathSciNet.
Page 77
Leaf-Disease-Detection-using Python (Open CV)
11. A.-K. Mahlein, T. Rumpf, P. Welke et al., ―Development of spectral indices for detecting
and identifying plant diseases,‖ Remote Sensing of Environment, vol. 128, pp. 21–30, 2013.
View at Publisher · View at Google Scholar · View at Scopus.
Page 78