Research paper_f
Research paper_f
Abstract-- Plant diseases significantly threaten global crop comprehensive dataset covering 38 distinct plant disease
yields, impacting both food security and farmer’s incomes. classes, including healthy leaves. To extend its practical
Accurate and early detection of plant diseases is crucial for value, the model is deployed through a Flask-powered web
effective intervention and management. This paper explores
application that enables users to upload plant leaf images for
a deep learning-based approach for plant disease
real-time diagnosis. The system supplements its predictions
with symptom descriptions, recommended treatments, and
classification using a Convolutional Neural Network (CNN)
direct marketplace links by integrating structured data from
model. The model was trained using a large dataset
curated Excel files. This end-to-end framework offers an
consisting of 61,486 plant leaf images, covering 38 distinct
efficient, accessible, and scalable solution for early plant
classes of plant diseases, including healthy leaves. We tested
disease detection and management.
the model on a separate dataset of 54,306 images, achieving a
classification accuracy of 88.62% and a macro-averaged F1-
II. LITERATURE SURVEY
score of 0.84. Performance evaluation through a confusion
matrix and classification report shows that the model The early methods of plant disease detection heavily relied
performs strongly across most disease categories. This model
on manual inspection and standardized physical assessments.
aims to assist farmers and agricultural experts in early
Stubbs et al. provided a fundamental foundation through the
disease detection, thereby improving crop health, boosting
yield, and minimizing agricultural losses. Cereal Disease Methodology Manual, detailing visual-based
and field inspection methods to identify cereal crop diseases.
Keywords-Machine Learning (ML), Plant Disease However, such traditional techniques were time-consuming,
Detection, Plant Leaf Analysis, Image Classification, highly subjective, and limited in scope, particularly for large-
Convolutional Neural Network (CNN). scale applications. [1]
Machine Learning (ML) introduced a paradigm shift by Hasan et al. (2024) further emphasized dataset importance in
enabling the automation of plant disease detection through their work on weed detection benchmarks. They curated
data-driven models. detailed, annotated datasets at the object level for precision
agriculture, noting that without diverse and realistic datasets,
Nturambirwe and Opara (2020) showcased ML even the best models fail under variable field conditions. [12]
applications for non-destructive defect detection in
horticultural products, utilizing sensor data combined with Future research should focus on developing large-scale,
algorithms like Support Vector Machines (SVM) and diverse, field-captured datasets, exploring transfer learning,
Neural Networks. Their review highlighted the potential of improving model interpretability, and designing hybrid
ML to achieve real-time, scalable quality assessment, multi-modal diagnostic systems.
reducing dependency on human expertise. [5]
III. METHODOLOGY
Similarly, Waldamichael et al. (2022) focused specifically
on cereal crops, reviewing how ML techniques like Dataset: For this research, a comprehensive image dataset
Random Forests, SVMs, and Decision Trees have been of plant leaves was used for training and testing the model.
employed for early disease detection. However, they also The dataset comprises over 61,486 images spanning 38
identified significant gaps, such as the lack of cereal- distinct plant disease classes, including healthy leaf
specific datasets and generalization issues across different categories.
environmental conditions. [6] Images in the dataset represent various plants such as apple,
grape, tomato, corn, and potato, and capture a diverse range
Deep Learning (DL), particularly Convolutional Neural of diseases including fungal, bacterial, and viral infections.
Networks (CNNs), further advanced plant disease Additionally, a separate independent test set of 54,306
recognition by eliminating the need for manual feature images was used for final model evaluation to ensure
extraction. Saleem et al. (2019) reviewed the application of unbiased performance assessment.
CNNs in detecting various plant diseases, noting superior
performance over traditional ML methods due to CNNs' To extend the functionality of the trained model, a web
ability to automatically learn complex patterns directly application was developed using Flask to provide an
from raw images. [7] accessible user interface. Users can upload images of
infected plant leaves directly to the application, which are
Expanding on this, Abade et al. (2020) systematically processed in real-time using the trained CNN model. Upon
reviewed the use of CNNs for plant disease recognition. predicting the disease, the application accesses structured
They highlighted the dominance of architectures like data from integrated Excel files to enhance decision support.
AlexNet, VGG, Inception, and ResNet across research Specifically, it retrieves the list of associated symptoms
studies. However, they pointed out common challenges, from disease_info.xlsx, while supplements_info.xlsx
such as model overfitting, dependency on synthetic provides recommended treatments along with marketplace
datasets like PlantVillage, and lack of robustness under links or product names for acquiring the suggested items.
real-world field conditions. [8] This approach allows the system not only to diagnose the
disease but also to inform the user about symptoms, suggest
Nagaraju and Chawla (2020) provided additional insights appropriate supplements, and direct them to verified sources,
into DL applications, emphasizing the importance of high- thereby delivering a complete and practical plant health
quality, annotated image datasets and proposing future support tool.
integration of hyperspectral data analysis for improved
performance. [9] Data Preprocessing : The Plant Village dataset, comprising
61,486 images of healthy and diseased leaves, was standardized
Recognizing the critical role of data availability, Hughes before training. Images were resized to 255×255 pixels and
and Salathé (2015) created the PlantVillage dataset — an center-cropped to 224×224 to ensure uniform input dimensions.
open-access repository of over 50,000 images of healthy Each image was converted into a tensor and normalized to the
and diseased plant leaves. This dataset democratized
[0,1] range using PyTorch’s transformation tools. The dataset
access to training data for researchers worldwide,
facilitating rapid advancements in mobile-based disease was randomly shuffled and split into training (59.5%), validation
diagnostic tools. [10] (25.5%), and testing (15%) subsets to support balanced training
and robust evaluation. insights beyond overall accuracy (Table 4.1).
Macro-Averaged F1-Score: To evaluate balanced
Modal Architecture: The proposed convolutional neural performance across all classes regardless of class
network (CNN) was carefully designed to balance depth, frequency.
complexity, and computational efficiency, with the goal of
Additionally, a progress bar was implemented during
achieving high classification accuracy across a diverse set of
evaluation to monitor the batch-wise testing process
plant disease categories. The architecture consists of four efficiently.
convolutional blocks, each comprising two convolutional
layers followed by rectified linear unit (ReLU) activations and We exported the confusion matrix results to an Excel file
batch normalization layers. The use of batch normalization (confusion_matrix.xlsx), and classification details were
helps stabilize and accelerate training by normalizing stored in a text file (classification_report.txt) for detailed
analysis.
activations at each layer, while ReLU introduces non-linearity
that enables the network to learn complex features. Tools and Libraries: The model was developed and
evaluated using the following software environment:
After each pair of convolutional layers within a block, a max
pooling operation with a 2×2 kernel is applied to reduce the Programming Language: Python 3.8
spatial dimensions by a factor of two. This progressive
reduction in size not only lowers computational overhead but Web Framework: Flask
also facilitates hierarchical feature learning—starting from
low-level features such as edges and textures in the earlier Frontend: HTML5, CSS3
layers, to more abstract representations in the deeper layers.
Excel Export: Pandas (for saving confusion matrix
as an Excel sheet)
The convolutional backbone transforms the input image into a
dense, high-level feature map of shape 256×14×14. This is Deep Learning Framework: PyTorch
then flattened into a one-dimensional vector of size 50,176 and
passed through two fully connected (dense) layers. The first Image Processing: TorchVision
dense layer consists of 1,024 neurons with ReLU activation,
and dropout regularization is applied with a dropout rate of 0.4 Evaluation Metrics: scikit-learn (classification
to prevent overfitting. The final layer maps the output to K report, confusion matrix, F1-score calculations)
neurons, corresponding to the number of plant disease classes,
producing raw class scores (logits) for classification.
data.
The proposed CNN-based model was evaluated on a large
independent test set comprising 54,306 images across 38 Table 4.2 provides the numeric confusion matrix generated
plant disease classes. The model achieved an overall after evaluating the model on the test set. In this matrix,
classification accuracy of 88.62%, demonstrating strong each row represents the actual disease class (by index), and
generalization capabilities over diverse crop and disease each column represents the class predicted by the model.
types. Values along the diagonal indicate correctly classified
instances, while values off the diagonal highlight instances
Table 4.1 presents a detailed classification report that
where the model confused one disease class for another.
summarizes the model’s performance across all 38 plant
This matrix is instrumental in diagnosing systematic
disease classes. Each row corresponds to a specific disease
prediction errors. For example, it helps identify specific
class, identified by its index, along with the related crop
classes that are frequently misclassified into one another —
and status/disease name. The table includes four key
such as the confusion observed between index 21 and
evaluation metrics for each class:
indices 31 and 33, which was supported both numerically
Precision: The proportion of correct positive
and visually. Such insights are essential for future
predictions out of all predicted positives.
improvements in model accuracy, including refining class
Recall: The proportion of actual positives correctly
boundaries or enhancing feature extraction methods.
identified by the model.
F1-Score: The harmonic mean of precision and High-Performing Classes : Several classes show excellent
recall, providing a balanced view. classification performance with precision, recall, and F1-
Support: The number of actual test samples scores above 0.95.
present in each class. Example 1: Corn — Healthy (Index 11, Support: 1162):
Precision: 1.00
This tabular format enables a clear comparison of model
Recall: 0.97
performance across different disease categories. High-
F1-Score: 0.98
scoring classes reflect strong learning and clear feature
distinction, while classes with lower precision or recall
Example 2: Soybean — Healthy (Index 25, Support:
suggest areas where misclassifications occurred, likely due
5090):
to visual similarity or under-representation in the training
Precision: 0.97
Recall: 0.97 Support: 1000
F1-Score: 0.97 Correct Predictions: 577
Misclassified as:
Example 3: Cherry — Powdery mildew (Index 6, Support: o Tomato — Late blight (Index 31): 138 times
1052): o Tomato — Leaf Mold (Index 32): 89 times
Precision: 0.88 o Tomato — Septoria leaf spot (Index 33): 76
Recall: 0.97 times
F1-Score: 0.92
REFERENCES