0% found this document useful (0 votes)

19 views

minor pro

The project report focuses on utilizing machine learning to analyze NDVI images from Sentinel-2 satellite data to cluster agricultural land into distinct zones for improved crop management. Techniques such as PCA and K-means clustering are employed to extract key features and provide insights for precision agriculture, enhancing sustainability and productivity. The project aims to demonstrate the integration of satellite imagery with machine learning to support farmers in optimizing land use and resource allocation.

Uploaded by

shubhamsinghb11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

minor pro

Uploaded by

shubhamsinghb11

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Crop yield analysis using machine learning

A PROJECT REPORT
Submitted By -
Rahul Narendra Sharma (221B291)
Sajal Korde (221B319)
Suryansh Pratap Singh (221B403)
Under Guidance Of : Dr. Amit Kumar Srivastava

November - 2024

Submitted in partial fulfillment for the award of the degree of

Bachelor Of Technology
IN

Computer Science Engineering

Department of Computer Science & Engineering

JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY, AB ROAD,
RAGHOGARH, DT. GUNA-473226 MP, INDIA
Declaration by the Student

I hereby declare that the work reported in the B. Tech. project entitled as “ Crop yield analysis
using machine learning”, in partial fulfillment for the award of degree of Bachelor of
Technology submitted at Jaypee University of Engineering and Technology, Guna, as per best of
my knowledge and belief there is no infringement of intellectual property right and copyright. In
case of any violation I will solely be responsible.

Rahul Narendra Sharma (221B377)

Sajal Korde (221B408)

Suryansh Pratap Singh (221B410)

Department of Computer Science and Engineering

Jaypee University of Engineering and Technology

Guna, M.P., India

Date: 20/11/2024
JAYPEE UNIVERSITY OF ENGINEERING & TECHNOLOGY
Accredited with Grade-A+ by NAAC & Approved U/S 2(f) of the UGC Act, 1956
A.B. Road, Raghogarh, District Guna (MP), India, Pin-473226
Phone: 07544 267051, 267310-14, Fax: 07544 267011
Website: www.juet.ac.in

CERIFICATE

This is to certify that the work titled “ Crop yield analysis using machine learning”
submitted by “ Rahul Narendra Sharma, Sajal Korde, Suryansh Pratap Singh” in partial
fulfillment for the award of degree of B.Tech(CSE) of Jaypee University of Engineering &
Technology, Guna has been carried out under my supervision. As per best of my knowledge and
belief there is no infringement of intellectual property right and copyright. Also, this work has not
been submitted partially or wholly to any other University or Institute for the award of this or any
other degree or diploma. In case of any violation concern student will solely be responsible.

Signature of Supervisor
Dr. Amit Kumar Srivastava
Assistant Professor
Date:20/11/2024
ACKNOWLEDGEMENT

We thank the almighty for giving us the courage & perseverance in completing the project. This project itself is
an acknowledgement for all those who have given us their heart-felt cooperation in making it a grand success.

We are also thankful to the project coordinator, Dr.Amit Kumar Srivastava for extending their sincere & heartfelt
guidance throughout this project work. Without their supervision and guidance, stimulating & constructive
criticism, this project would never come out in this form. It is a pleasure to express our deep and sincere gratitude
to the Project Guide Dr. Amit Srivastava and are profoundly grateful for the unmatched help rendered by him.

Last but not the least, we would like to express our deep sense and earnest thanksgiving to our dear parents for
their moral support and heartfelt cooperation in doing the project. We would also like to thank our friends, whose
direct or indirect help has enabled us to complete this work successfully.

Rahul Narendra Sharma (221B291)

Sajal Korde (221B319)

Suryansh Pratap Singh (221B403)

Date:20/11/2024
SUMMARY

This project utilizes machine learning to analyze NDVI (Normalized Difference

Vegetation Index) images derived from Sentinel-2 satellite data obtained via
Google Earth Engine, with the goal of clustering agricultural land into distinct
zones. By applying techniques such as autoencoders, PCA (Principal Component
Analysis), and K-means clustering, the project extracts and analyzes key features
from NDVI images to group regions with similar vegetation characteristics.

The workflow begins with dimensionality reduction using PCA and feature extraction
through autoencoders to enhance clustering efficiency. K-means clustering is then
employed to segment the image into distinct clusters representing varying vegetation health
or crop types. This methodology provides actionable insights for precision agriculture,
enabling targeted interventions and better crop management strategies.

The project demonstrates the potential of integrating satellite imagery with machine
learning to support sustainable agricultural practices, offering farmers and researchers a
powerful tool to monitor and optimize land use effectively.
CONTENTS

1. Title Page………………………………………………………….…..1
2. Declaration by the Student…………………………………………..2
3. Certificate………………………………………………………….….3
4. Acknowledgement…………………………………………………....4
5. Summary……………………………………………………..……….5
6. Chapter1: Introduction…………………………………………..……8-14
o 6.1 Introduction To agriculture
o 6.2 Types of Crops
o 6.3 AI ML in Agriculture
o 6.5 Benefits to Farmer
7. Chapter 2: Literature Review………………………..……………….15-22
o 7.1 Problem Definition
o 7.2 Existing Work
o 7.3 Research Gap
o 7.4 Proposed System
8. Chapter 3: Requirement Analysis………………...……………..…..23-26
o 8.1 Project Objectives
o 8.2 System Overview
o 8.3 Functional Requirments
o 8.4 Hardware and framework requirments
o 8.5 Feasibility Analysis
9. Chapter 4: System Design and Implementation …………...16-36
o 9.1 Introduction to System Design
o 9.2 Crop Clustering Model
o 9.3 Data Preprocessing
o 9.4 Feature Extraction Method and Feature Engineering
o 9.5 Model Training and Evaluation
o 9.6 ARI Score

10. Chapter 5: Results and Conclution…………………..……………37-39

o 10.1 Model Performance Evaluation
o 10.2 Output Results
o 10.3 Key Points and Achivements
o 10.4 Scope for Future Enhancements

Chapter 7: Results and Conclusion ………………………………40-41

10.Appendix……….……………………………………………………...42-49
11.References……………………………………………………………….50
Chapter 1
Introduction

1.1 Introduction to Agriculture

Agriculture, the practice of cultivating plants, raising animals, and harnessing
natural resources for food, fiber, and other essential needs, is a cornerstone of
human civilization. As one of the most fundamental human activities, it plays a
pivotal role in sustaining global economies, ensuring food security, creating
employment, and driving socio-economic development. Over centuries,
agriculture has transitioned from traditional subsistence methods to sophisticated,
technology-driven systems capable of addressing large-scale demands.

Despite these advancements, modern agriculture faces persistent challenges such as

climate variability, soil degradation, crop diseases, and inefficient resource utilization.
These issues threaten productivity, increase the likelihood of crop failures, and exacerbate
resource wastage, underscoring the need for innovative approaches.

To overcome these hurdles, the integration of data-driven technologies like Artificial

Intelligence (AI) and Machine Learning (ML) has become increasingly crucial. These
technologies empower modern agriculture with tools for precision farming, real-time
decision-making, and efficient resource management. By leveraging satellite imagery,
environmental data, and advanced algorithms, AI and ML pave the way for sustainable
agricultural practices and improved resilience against uncertainties.
This project focuses on utilizing ML-based clustering techniques to analyze NDVI images
from Sentinel-2 satellites, providing actionable insights to enhance agricultural productivity
and sustainability.

1.2 Types of Crops and Growing Seasons

Agricultural crops are broadly classified based on their uses and the seasons in
which they are grown.
Crops Based on Usage:
• Cereals: Staple foods like wheat, rice, maize, and barley, rich in
carbohydrates, formingthe foundation of diets worldwide.
• Pulses: Protein-rich crops like lentils, chickpeas, and black gram, crucial
for nutritionand soil fertility (through nitrogen fixation).
• Oilseeds: Mustard, soybean, and sunflower, used for edible oils and industrial
purposes.
• Fruits and Vegetables: Nutrient-rich crops like apples, tomatoes, and
potatoes, essentialfor a balanced diet.
• Fiber Crops: Cotton and jute, used in textile industries.
• Cash Crops: Sugarcane, coffee, and tea, grown for trade and commercial profit.

1.3 Crops Based on Growing Seasons

1. Kharif Crops (Monsoon Crops): Sown during June-July and harvested in
October-November, requiring high rainfall (e.g., rice, maize, cotton).

2. Rabi Crops (Winter Crops): Sown during October-November and harvested in

March-April, thriving in cool weather (e.g., wheat, barley, mustard).
3. Zaid Crops (Summer Crops): Grown in March-June between rabi and kharif
seasons,suited for hot and dry conditions (e.g., watermelon, cucumber).

1.4 AI ML in Agriculture
AI and Machine Learning (ML) are fundamentally transforming the agriculture sector, ushering
in an era of greater efficiency, sustainability, and productivity. These technologies are
empowering farmers by providing them with tools to make data-driven decisions that enhance
crop management, improve yields, optimize resource use, and mitigate risks associated with
unpredictable weather patterns, pests, diseases, and market fluctuations. Through precision
agriculture, AI and ML algorithms can analyze vast amounts of data from various sources—such
as satellite imagery, sensors, weather forecasts, and historical crop data—enabling farmers to
respond proactively and adapt to changing conditions.

In the context of crop classification, these technologies can be particularly beneficial. By using
machine learning models, farmers can accurately identify and classify crops based on visual or
environmental data, such as images or soil conditions. This project specifically focuses on
applying AI and ML techniques to classify wheat and gram crops during the Rabi season, which
typically runs from October to March in India. The goal is to leverage these advancements to assist
farmers in identifying crop types more efficiently, thereby improving crop management practices,
optimizing input use (such as fertilizers and water), and minimizing crop losses.

The initial implementation of this project will be carried out in the Guna district of Madhya
Pradesh, a region known for its agricultural activities, particularly wheat and gram cultivation. By
testing and refining the AI and ML models in this specific geographic area, the project aims to
develop a scalable solution that can later be expanded to other regions. The success of this
initiative will not only benefit farmers by increasing their productivity and sustainability but also
contribute to the larger goal of food security by ensuring more efficient use of resources and
reducing waste.

Through this innovative application of technology, the project hopes to demonstrate the potential
of AI and ML to revolutionize agriculture, improving the livelihoods of farmers and enhancing
the overall agricultural output of the region. By automating tasks that would typically require
manual labor, this approach has the potential to drive greater accuracy in crop management, reduce
human error, and enable faster decision-making based on real-time data. As a result, this project
stands to significantly contribute to the future of agriculture, making it smarter, more efficient,
and better equipped to meet the demands of a growing global population.

Key Characteristics of ML
• Learning from Data:
Machine learning involves training algorithms on data, where the system "learns" from
patterns or structures in the data.

• Models and Algorithms:

A model is the representation of what the machine has learned from the data.
An algorithm is a set of rules or steps followed to build the model.

• Training and Testing:

The process of feeding data into an algorithm to allow it to learn the relationships between
the input features (data) and the desired output (prediction or classification).

Once the model has been trained, it's evaluated using new, unseen data (called the test set)
to check its performance and generalize its predictions.

1.5 How ML is Useful in Agriculture

1. Crop Monitoring and Health Assessment

• NDVI & ML Integration: NDVI is a measure of plant health based on the reflection of
light by vegetation, specifically the difference between near-infrared and red light. ML can
be used to analyze NDVI data over time to detect patterns and trends in crop health.

• Crop Stress Detection: By feeding NDVI data into machine learning algorithms, it's
possible to detect early signs of stress in crops, such as drought, disease, or pest infestation,
allowing for timely intervention.

• Predictive Models: ML algorithms can use historical NDVI data to predict future crop
conditions and yields, which helps farmers optimize inputs like water, fertilizers, and
pesticides.

2. Precision Agriculture

• Automating Decision-Making: Machine learning models can analyze NDVI data

alongside other factors (such as soil health, weather conditions, and irrigation patterns) to
make automated decisions for precision agriculture, like adjusting irrigation schedules or
nutrient applications to optimize crop growth.

• Variable Rate Technology (VRT): ML-driven systems can use NDVI data to create
prescription maps, guiding machinery to apply inputs precisely where they’re needed,
reducing waste and increasing efficiency.

3. Yield Prediction

• Forecasting Crop Yields: ML algorithms can use NDVI, combined with weather data, soil
health, and historical yield data, to predict future yields. This helps farmers make informed
decisions about harvesting, storage, and sales.
• Risk Management: By analyzing trends in NDVI, ML models can identify potential yield
losses due to adverse conditions, such as extreme weather events, enabling farmers to take
preventive measures.

4. Land and Crop Classification

• Mapping Crop Types: ML can be used to classify different crop types based on NDVI
data obtained through satellite or drone imagery. This can help farmers manage different
crops more effectively.

• Land Use Optimization: Machine learning can also analyze NDVI data to help in land use
planning, identifying areas of the field that need more attention or are underperforming.

5. Disease and Pest Detection

• Anomaly Detection: ML models can be trained to identify subtle changes in NDVI that
might indicate the onset of diseases or pest infestations, often before they’re visible to the
naked eye.

• Early Warning Systems: Combining NDVI with other environmental data, ML algorithms
can create early warning systems to help prevent widespread crop damage from pests or
diseases.

6. Climate Impact Analysis

• Weather and Climate Adaptation: ML can be used to assess how climate change is
affecting crop production by analyzing long-term NDVI trends in conjunction with climate
data. This can help farmers adapt their practices to shifting growing seasons and changing
climate conditions.

7. Yield Optimization through Data Fusion

• Multi-source Data Integration: Machine learning can integrate NDVI data with other data
sources like soil moisture, temperature, and nutrient levels to build comprehensive models
for optimizing crop growth and yield.

• Smart Irrigation Systems: ML algorithms can use NDVI to help develop intelligent
irrigation systems that optimize water usage based on real-time plant needs, soil conditions,
and weather forecasts.
1.6 Benefits for Farmers
Precise Crop Identification: Helps monitor field-level crop distributions.

Efficient Resource Allocation: Enables better planning for irrigation, fertilizers, and other
inputs.

Improved Crop Management: Offers insights into crop health trends over time.

Minimized Crop Loss: Early detection ensures timely interventions.

Reduced Costs: Targeted treatment lowers pesticide use.

Higher Yields: Enhanced crop health leads to better market returns.

This project demonstrates how AI and ML can support farmers in classifying crops, a crucial step
toward modernizing agriculture and ensuring food security. By focusing on wheat and gram crops
in Guna, it establishes a scalable framework for broader applications across other regions and crop
types.

Yield Prediction: In this project, machine learning models analyze environmental factors
such as rainfall, temperature, and satellite-derived NDVI imagery to classify wheat and gram
crop fields during the Rabi season in Guna district. By leveraging historical and real-time
data, these models can also assist in predicting potential yields.
Benefits:
• Facilitates resource allocation, including fertilizers and labor, to optimize farming
practices.
• Supports better financial planning by forecasting expected yields and market supply.
• Reduces waste by aligning harvest schedules with crop readiness and market demands.

Smart Irrigation Systems

Although not directly implemented in this project, the insights from the classified crop types and
NDVI trends can be used to optimize irrigation schedules. Future integration with AI-driven
systems could improve water use efficiency, particularly in water-stressed regions.

Pest and Weed Management

The classification of crops using ML models lays the groundwork for targeted pest and weed
monitoring. By combining this with field-level data, future extensions of the project could
include identifying pest hotspots or areas requiring weed control using AI-based tools and
sensors.

Benefits:

• Reduces reliance on broad-spectrum pesticides, focusing on specific areas in need of

treatment.
• Minimizes costs while safeguarding crop health and reducing environmental impacts.

Supply Chain Optimization

Classifying wheat and gram fields enables better planning for harvesting and storage logistics.
Combining this with predictive analytics could help farmers in the Guna district identify optimal
times for harvesting and selling based on demand trends.

Benefits:

• Reduces post-harvest losses by optimizing transport and storage.

• Supports increased profits by aligning selling strategies with market trends.
• Encourages direct farmer-to-market links, reducing dependency on intermediaries.
Chapter 2
Literature Review

2.1 Problem Definition

In the field of agricultural monitoring, significant challenges arise when applying
machine learning methods to smallholder farming regions, such as those in Guna,
India. While previous research, such as studies in Jixi, China, demonstrated
successful classification of crops using satellite imagery, these methods often
assume larger and more uniform field sizes. In contrast, the crop fields in Guna
are up to 20 times smaller, presenting a major limitation for direct application of
these techniques. The relatively low granularity of satellite images, despite similar
resolutions, exacerbates the difficulty in distinguishing between adjacent fields
and accurately classifying crops.

To address this issue, we implemented a customized approach combining Adaptive

Principal Component K-means (APK), autoencoders, Principal Component Analysis
(PCA), and K-means clustering. By using autoencoders to extract meaningful features
from Sentinel-2 NDVI imagery, followed by dimensionality reduction through PCA and
clustering with K-means, we adapted the methodology to better suit the complexity of
small-sized crop fields in Guna. This tailored solution ensures effective crop classification
and segmentation in a region where traditional approaches struggle due to field size
constraints, ultimately providing more reliable insights for wheat and gram crop
monitoring during the Rabi season.

2.2 Existing Work

1. Title: Crop Yield Prediction Using Machine Learning: A Systematic Literature Review
• Overview: This paper provides a comprehensive review of machine learning (ML) approaches used for
crop yield prediction. It examines major ML models, methodologies, and applications in agriculture,
highlighting the challenges, current trends, and future directions for improving prediction accuracy and
scalability.
• Key Findings: The study emphasizes the potential of ML techniques, such as regression models, neural
networks, and ensemble methods, for precise yield predictions. However, it identifies challenges like
limited availability of high-quality data, overfitting, and scalability issues in applying these models to
diverse agricultural contexts.
• Relevance: The findings underscore the importance of addressing scalability and generalization when
working with smallholder farming regions, such as Guna, India, where crop field sizes vary significantly
compared to those in larger, industrialized farming zones.

2. Title: Crop Yield Prediction Using Deep Reinforcement Learning Model for Sustainable Agrarian
Applications

• Overview: This study explores the use of deep reinforcement learning (DRL) for predicting crop yields
while optimizing farming operations and resource allocation.
• Key Findings: DRL models excel at learning optimal farming strategies by balancing short-term yield
improvements with long-term resource sustainability. The approach enables dynamic decision-making
based on changing environmental factors.

• Relevance: This paper introduces an innovative perspective for sustainable farming, aligning with the
need to optimize resources in regions like Guna, where granular, site-specific data and limited resources
pose challenges to traditional machine learning models.

3. Title: Machine Learning in Precision Agriculture: A Survey on Trends, Applications, and

Evaluations Over Two Decades

• Overview: This survey tracks the evolution of ML applications in precision agriculture, focusing on
trends, key applications, and performance evaluations.
• Key Findings: The report highlights the increasing adoption of neural networks, support vector machines
(SVMs), and decision trees for tasks like yield prediction, crop classification, and pest detection. It also
discusses advancements in integrating remote sensing data for spatial analysis.
• Relevance: By identifying gaps in applying ML to small-scale farming, this paper provides insights into
adapting techniques for smaller crop fields and integrating remote sensing data with unsupervised
clustering models, as done in this project.

4. Title: Crop Classification in High-Resolution Remote Sensing Images Based on Multi-Scale

Feature Fusion Semantic Segmentation Model

• Overview: This paper proposes a multi-scale feature fusion model for crop classification using high-
resolution remote sensing images, combining spatial and spectral data to improve accuracy.
• Key Findings: The method enhances crop discrimination by leveraging both spatial patterns and spectral
features, achieving high classification accuracy.

• Relevance: For Guna's smallholder fields, the spatial and spectral challenges addressed in this paper
resonate closely. However, unlike the multi-scale fusion model, this project used a combination of
autoencoders, PCA, and K-means clustering to adapt to smaller field sizes and lower-resolution imagery.
5. Title: Crop Yield Estimation Using Satellite Images: Comparison of Linear and Non-Linear
Models

• Overview: This study compares the effectiveness of linear and non-linear models for crop yield
prediction using satellite imagery, emphasizing spectral and spatial data.
• Key Findings: While linear models are computationally efficient, non-linear models like neural networks
capture complex relationships in satellite data, providing better predictions.
• Relevance: This work informs the choice of ML models for processing Sentinel-2 NDVI imagery in this
project. Due to the limitations of field size and resolution in Guna, a hybrid approach involving
dimensionality reduction and unsupervised clustering was used instead of purely non-linear predictive
models.

6. Title: Understanding Satellite-Imagery-Based Crop Yield Predictions

• Overview: This paper focuses on the potential of satellite imagery in agricultural forecasting, exploring
methods for integrating remote sensing data with ML for improved yield prediction.
• Key Findings: Combining spectral indices (e.g., NDVI) with advanced ML techniques significantly
enhances yield prediction accuracy. However, challenges like spatial resolution and field heterogeneity
persist.
• Relevance: This paper highlights the importance of adapting remote sensing and ML methods for
regions with small and heterogeneous crop fields, as addressed in this project through unsupervised
feature extraction and clustering.

7. Title: Corn Yield Prediction Model with Deep Neural Networks for a Smallholder Farmer Decision
Support System

• Overview: This study develops a corn yield prediction model using deep neural networks, focusing on
decision-making support for smallholder farmers.
• Key Findings: The deep learning model improves accuracy and facilitates better resource planning,
benefiting smallholder farmers with limited access to advanced tools.
• Relevance: Although this project did not employ deep neural networks, the emphasis on smallholder
farmers aligns with the goal of adapting ML techniques to small and fragmented fields in Guna using
autoencoders and clustering methods for initial classification of wheat and gram crops.

2.3 Research Gap

In our research, we identified a significant and underexplored gap in the existing literature
regarding the use of NDVI (Normalized Difference Vegetation Index) for crop clustering. Most
prior studies on crop classification and clustering using remote sensing data primarily focused on
labeled datasets and satellite or drone imagery of large, homogeneous crop fields. These studies
were typically conducted in regions with large-scale agricultural systems, where fields tend to be
vast and uniform. As a result, the models developed for crop clustering using NDVI data have
been largely tailored for expansive agricultural landscapes, such as those seen in countries with
large farm sizes or industrial agriculture sectors. In these studies, crop fields, often spanning
several hectares or more, exhibit relatively consistent crop types and growth stages, which makes
clustering and classification tasks more straightforward.

However, in countries like India, the landscape is starkly different. India, along with many other
developing nations, has a predominance of smallholder farms, where fields are often significantly
smaller—typically around 20 times smaller than those commonly studied in large-scale
agriculture research. The average size of a field in India is often just a few acres, and these fields
are frequently characterized by a high degree of fragmentation, with a mix of crop types, varied
growth stages, and fluctuating environmental conditions all within a single small plot. This
diversity within smaller fields presents a challenge for crop classification, particularly when using
NDVI data, which can be influenced by these variations. The traditional models designed for large,
uniform crop fields often fail to capture the complexities of smaller agricultural plots, where the
heterogeneity of the landscape can lead to misclassification or oversimplification of crop types.

The gap we identified in this area is crucial because the vast majority of research on crop clustering
using remote sensing does not address the specific challenges of smallholder agricultural systems.
This oversight is particularly problematic for countries like India, where the majority of farmers
cultivate small, fragmented fields and rely on mixed cropping systems. The lack of research on
how to effectively cluster crops in these contexts means that the existing crop classification models
are not directly applicable to smaller fields, leading to a disconnect between remote sensing-based
approaches and the realities of smallholder farming.

Recognizing this research gap, we proposed a novel solution that specifically addresses the
challenges associated with clustering crops in smallholder farming systems, particularly in India.
Our model adapts traditional clustering techniques to handle the complexities of smaller, more
fragmented fields, which often contain multiple crop types grown in close proximity. We proposed
an approach that accounts for the high variability found in smaller fields by using higher-
resolution NDVI data and refining the way in which crop types are clustered. This approach was
designed to be sensitive to the mixed cropping patterns prevalent in smallholder agriculture, where
different crops might be grown in the same field at the same time, and the growth stages of each
crop can vary significantly.
In developing our model, we also incorporated additional factors that are critical in the context of
Indian agriculture, such as localized weather patterns, soil variability, and irrigation practices, all
of which can influence NDVI readings and crop growth. Our model uses this integrated data to
create more accurate and reliable clustering results, allowing for a more precise classification of
crops even in small, fragmented fields. We also leveraged machine learning techniques to refine
the clustering process, enabling the model to learn from the unique patterns in smaller agricultural
fields and improve its accuracy over time.

The primary innovation of our research lies in its ability to adapt crop classification models to the
specific needs of smallholder farmers, particularly in regions like India, where farm sizes are much
smaller, and cropping systems are more diverse. This adaptation fills a significant gap in the
literature by providing a tailored solution for small-scale, mixed cropping systems that are
common in many parts of the world. Our model not only contributes to advancing remote sensing-
based crop classification techniques but also offers practical insights that can directly benefit
farmers by improving the accuracy of crop monitoring and management, leading to better resource
allocation, crop yield prediction, and overall agricultural productivity.

In summary, through our research, we have identified a crucial gap in the existing body of work
on crop clustering using NDVI data—namely, the lack of focus on smallholder farming systems
with fragmented, diverse fields. By proposing a model designed specifically for smaller
agricultural landscapes, we aim to address this gap and provide a more accurate and contextually
relevant approach to crop classification. Our work fills an important niche in the agricultural
research field, offering a solution that can be applied in regions like India, where small fields and
mixed cropping systems present unique challenges for remote sensing and crop clustering. This
model has the potential to transform how crop monitoring and classification are approached in
smaller, more diverse agricultural systems, ultimately contributing to more sustainable and
efficient farming practices.

2.4 Proposed System

We propose a system specifically designed to address the challenges faced
in the classification of wheat and gram crops in the Guna district during
the Rabi season, leveraging Sentinel-2 NDVI imagery. The proposed
solution is tailored for smallholder farming scenarios, where crop field
sizes are significantly smaller than those in regions typically studied in
existing research, such as Jixi, China.
To overcome the limitations posed by low-resolution imagery and small, fragmented field
sizes, the system integrates the following components:

1. Feature Extraction:

• We employ Autoencoders to extract meaningful features from high-

dimensional satellite images. This unsupervised method helps capture latent
patterns that are not directly visible in the raw data.
• Principal Component Analysis (PCA) is applied for dimensionality reduction,
retaining the most significant information while reducing computational
complexity.
2. Clustering Approach:

• K-means clustering is used to group fields based on their NDVI characteristics,

identifying patterns indicative of wheat and gram crop types.
• The approach is adapted to the unique spatial and spectral challenges of small
fields, ensuring accurate clustering despite the low granularity of resolution.
3. APK Algorithm:

• To refine the classification process, the Adaptive K-means Partitioning (APK)

algorithm is used. This technique adjusts to the heterogeneous nature of field
sizes, ensuring better separation and classification accuracy in fragmented
agricultural landscapes.

Key Features:

• Scalability: The system can be extended to include other crops and integrate
additional datasets, such as weather data or farmer-reported inputs.
• Efficiency: By combining autoencoders and PCA, the system effectively handles large
datasets while minimizing computational overhead.
• Adaptability: The use of unsupervised learning methods allows the system to work
with minimal labeled data, which is particularly valuable in regions with limited
ground truth information.

Benefits:

• Enables accurate classification of wheat and gram crops despite challenges posed by
field size and resolution.
• Provides a foundation for scaling up crop classification to other regions and crop
types.
• Supports decision-making for sustainable agricultural practices in smallholder
farming regions.

This system demonstrates a forward-looking, data-driven approach to crop classification,

tailored to the specific needs and constraints of Indian agricultural contexts like the Guna
district.

STEPS:

Define a Problem:
• The project focuses on classifying wheat and gram crops during the Rabi season in
the Guna district, India. A key challenge is dealing with smallholder fields, which are
significantly smaller than fields studied in similar research, making classification
with Sentinel-2's relatively low resolution more complex.
Preparing Data:
• Data Acquisition: NDVI imagery for the region was collected using Google Earth
Engine (GEE) for specific timeframes correlating to peak crop growth stages.
• Preprocessing: The imagery was processed in QGIS to remove noise, clip the data
to the region of interest, and normalize values. The preprocessed data was then
exported for feature extraction.
Feature Extraction:

• Autoencoders: A deep learning-based autoencoder model was applied to encode

high-dimensional NDVI data into lower-dimensional features while retaining
essential patterns.
• PCA: Principal Component Analysis (PCA) was then used to further reduce the
dimensionality, ensuring compact and meaningful feature representation for
clustering.
Clustering:

• K-Means Clustering: The extracted features were clustered using the K-Means
algorithm to classify the crops.
• Model Iteration: The clustering model was iteratively refined and trained over 10
iterations to enhance its accuracy and reliability.
Testing and Validation:

• The final model was evaluated on a fresh dataset to test its generalization and
performance. Validation was conducted by comparing predictions with manually
labeled data to ensure accuracy in distinguishing wheat and gram crops.
Chapter 3

Requirement Analysis

3.1 Introduction to Requirement Analysis

Objectives:
• The primary objective of the Requirement Analysis for this project is to document all the
technical, functional, and non-functional specifications needed for the system to operate
effectively. This chapter outlines the core requirements for the crop field clustering
functionality, which aims to categorize crop fields based on crop types, even when the fields
are small, using low-resolution NDVI images.
• The objective also includes identifying the necessary technical infrastructure, software
dependencies, and algorithmic approaches (such as Autoencoders, PCA, and K-means) to
ensure accurate and efficient clustering of crop fields, despite challenges such as lower
resolution and small field sizes.
• The requirement analysis aims to distinguish between essential functional capabilities (like
crop field clustering and feature extraction from NDVI images) and critical non-functional
features (such as performance, scalability, and usability). These will ensure that the model
provides meaningful insights for agricultural practices in small-scale farms, particularly in
regions like Guna, India.

3.2 Scope

• The scope of this project focuses on clustering crop fields based on their types, even
with the limitations of small field sizes and low-resolution satellite imagery. The
project will involve processing NDVI images from Google Earth Engine, followed by
feature extraction using Autoencoders and PCA, with clustering performed using K-
means.
• Clustering Crop Fields: The model will be trained to cluster crop fields into different
categories (such as wheat and gram) based on their NDVI values, even when the
resolution is low and the fields are small

3.3 System Overview

System Functionality:

• The system is designed to analyze NDVI satellite images, process the data to extract
features, and cluster crop fields by crop types using machine learning algorithms. It
will address challenges such as small field sizes, low-resolution images, and the need
for high accuracy in classification.

User Roles and Actions:

• Primary Users: The target audience for the system includes researchers,
agronomists, and potentially government agencies involved in agricultural
management, crop monitoring, and land use planning. These users will utilize the
model to gather insights about crop distribution and type identification for small-scale
fields.

User Actions:

• Crop Field Clustering: Users will upload NDVI images of agricultural areas, and the
system will process the data, apply feature extraction methods (Autoencoders, PCA),
and perform clustering using K-means to categorize fields into different crop types.
• Data Analysis: The system will provide users with insights into the spatial
distribution of crops, the effectiveness of agricultural practices in small fields, and
resource utilization.
• Performance Evaluation: Users can evaluate the clustering accuracy, especially in
small fields, through different performance metrics, including silhouette score and
clustering consistency.

3.4 Functional Requirements

• Data Input Requirements:

• Parameters:
• The model takes NDVI images of agricultural fields as input, with feature
extraction performed from these images.
• Input Ranges:
• NDVI: The expected NDVI values should range from -1 to +1, with specific
thresholds to distinguish between different crop types based on NDVI
intensity values.
• Input Validation:
• NDVI images should be in an acceptable format (e.g., GeoTIFF) and have
the necessary resolution for small field identification (at least 10 meters
per pixel).
• Machine Learning Model Specifications:

• Models:
• K-means clustering is used for categorizing crop fields based on the
extracted features from Autoencoders and PCA.
• ARI Score: Clustering performance is evaluated using the Adjusted Rand
Index (ARI) score, which measures the similarity between the predicted
clustering and the true labels, even when field sizes are small or images
are low resolution.
• Accuracy Requirements:
• The clustering model should achieve an ARI score of at least 0.7,
indicating good performance in clustering crop fields even in challenging
conditions like small field sizes and low-resolution satellite images.
• Feature Importance:
• The model will provide a visual output showing the importance of
different features (e.g., NDVI values) in clustering, which helps identify
key factors influencing the classification of crop fields.
• Clustering Process:

• Refinement Iterations:
• The model undergoes 10 iterations of clustering refinement to improve
accuracy, particularly in identifying small crop fields that might otherwise
be difficult to categorize.
• Performance Targets:

• The model will aim for an ARI score of at least 0.7, ensuring the clustering is
accurate in distinguishing different crop types even when fields are small or
resolution is low.

• Image Input Requirements:

• Accepted Formats:
• The model accepts GeoTIFF and JPEG image formats commonly used for
satellite remote sensing.
• File Size and Resolution:
• Images should not exceed 10MB in size and should have a resolution that
allows for proper differentiation of small fields (at least 10 meters per
pixel).
• Preprocessing:
• NDVI images will be preprocessed to normalize pixel values, and relevant
spatial features will be extracted. Basic image augmentation may be
applied to improve robustness during the clustering process.
• Clustering Model Specifications:

• Architecture:
• The system uses K-means clustering for initial crop field categorization,
followed by iterative refinement to improve results. Additional clustering
algorithms like DBSCAN might also be tested for better performance on
small fields.
• Fine-Tuning Requirements:
• The model will undergo 10 iterations of clustering refinement to improve
accuracy in field detection, especially for smaller crop fields in low-
resolution data.
• Performance Targets:
• The model should target an ARI score of at least 0.7, ensuring that crop
fields are accurately identified and clustered, despite challenges like small
field sizes and poor image resolution.
• Result Processing and Display:

• Clustering Results:
• The output will consist of clustered maps showing different crop types.
These results will help users understand the distribution of specific crops,
even in regions with small or low-resolution fields.
• Actionable Insights:
The clustering results will provide actionable insights that assist users in making decisions
related to crop management and optimizing field use, such as identifying areas best suited

3.3.2 System Requirements and

Software Requirements
1. Operating System:

• Windows 7 or later, macOS, or Linux (Ubuntu 18.04+)

2. Programming Language:

• Python 3.7+ (Recommended for model development and execution)

3. Libraries and Frameworks:

• Machine Learning:
• TensorFlow, Keras (for deep learning, if used for feature extraction or pre-
trained model)
• Scikit-learn (for clustering models like K-means and evaluation metrics such
as ARI score)
• Data Analysis and Visualization:
• Pandas, NumPy (for data manipulation)
• Matplotlib, Seaborn (for visualizing the results of clustering and other
analyses)
4. Development Environment:

• IDE/Code Editor:
• Visual Studio Code or Jupyter Notebook (for developing the model and
analyzing results)
5. Data Handling Tools:
• GeoTIFF or other raster formats for handling satellite imagery
•
• GDAL (if necessary for processing geographic data)

Hardware Requirements

1. Processor:

• Intel Core i5 (8th gen or equivalent) or higher (for model development and
execution)
2. RAM:

• 8 GB (minimum recommended for running the model efficiently, especially with

larger datasets)
3. Storage:

• 128 GB HDD or SSD (adequate for storing datasets, model weights, and output files)
4. Graphics:

• NVIDIA GeForce 4050 (or similar, which can help with faster model processing,
especially for larger datasets and GPU-accelerated deep learning models)
5. Internet Connection:

• Basic internet connection (for downloading datasets, model dependencies, and

running lightweight model evaluations)

3.4 Technical Feasibility:

The chosen technologies and tools—Google Earth Engine (GEE) for obtaining Sentinel-2 NDVI
images, QGIS for preprocessing, and an autoencoder-PCA-KMeans framework for clustering—
are technically feasible for achieving accurate classification of crop types. This approach
leverages well-established methods in remote sensing, dimensionality reduction, and clustering,
ensuring reliable results and model performance over multiple iterations.

Operational Feasibility:

The proposed system aligns with agricultural and environmental research needs by providing
detailed, data-driven insights into crop type distribution. Preprocessing in QGIS, combined with
clustering, ensures that relevant crop features are accurately captured and classified, aiding in
effective agricultural management and planning.
Economic Feasibility:

Initial costs may include data storage for NDVI imagery, computational resources for running
machine learning models, and potential cloud-based services for processing large datasets. These
costs are projected to be manageable within the typical budget of a machine learning project
focused on agricultural applications, providing a balance between cost and the value of precise
crop insights.
Chapter 4

Design and Implementation

4.1 Introduction

This chapter delves into the technical architecture and methodologies of our crop classification system
using NDVI images. The goal is to preprocess satellite imagery, apply advanced machine learning techniques
to extract meaningful patterns, and cluster different crop types for agricultural insights. Here, we outline
each component, the module design, and the implementation processes, focusing on the underlying
algorithms, data preprocessing, dimensionality reduction, and clustering. Detailed discussions cover the
machine learning workflow, data preparation, and clustering strategies used to build a robust and scalable
solution for analyzing crop distributions.

System Workflow:

1. Image Acquisition and Preprocessing:

• NDVI images are sourced from Google Earth Engine (GEE) using Sentinel-2 satellite data.
• Preprocessing is performed using QGIS to enhance and prepare the images for subsequent
machine learning stages, such as masking, normalization, and region-of-interest extraction.
2. Dimensionality Reduction using Autoencoder and PCA:

• Preprocessed images are fed into an autoencoder model for feature extraction and noise
reduction.
• The output of the autoencoder is then processed using Principal Component Analysis (PCA)
to reduce dimensionality further, retaining the most relevant features for clustering.
3. Clustering with K-Means:

• The reduced data is passed through the K-Means clustering algorithm to categorize pixel
regions based on crop types.
• This process is iterated over 10 times to ensure optimal clustering and stable results.
• The clusters are then validated using the Adjusted Rand Index (ARI) score to assess the
agreement between predicted clusters and ground truth data, providing a robust measure of
clustering performance.
4.2 System Architecture

The architecture of the system is designed to handle large-scale satellite image processing and
clustering in a modular manner, with each component focused on a specific task. The system
comprises three primary modules:

1. Image Preprocessing Module:

• Utilizes QGIS for preprocessing NDVI images.

• Responsible for tasks like masking, region selection, and image enhancement to
prepare data for further processing.
2. Dimensionality Reduction Module:

• Employs an autoencoder to reduce image noise and retain important features.

• PCA is applied subsequently for dimensionality reduction, making the data suitable
for clustering while retaining key features.
3. Clustering and Validation Module:

• The K-Means algorithm is used to cluster the reduced data into distinct crop types.
• The clustering process is repeated iteratively for accuracy and stability.
• The Adjusted Rand Index (ARI) score is used to validate the clustering results,
ensuring high agreement between the predicted clusters and the reference data.

1. Crop Clustering Module

The Crop Clustering Module is a core component of our system, designed to classify and
cluster crop types based on NDVI (Normalized Difference Vegetation Index) images
obtained from satellite data. By utilizing machine learning (ML) frameworks, this module
identifies and categorizes crop types, providing valuable insights for agricultural analysis
and planning. The module leverages an autoencoder for feature extraction, Principal
Component Analysis (PCA) for dimensionality reduction, and K-Means clustering for crop
classification. This process aims to improve the understanding of crop distributions and
optimize land use planning.

The core concept of the Crop Clustering Module is to process NDVI images through a pipeline
of ML algorithms to detect and group crop areas within a specific region. This method offers
data-driven insights that aid in crop analysis and enhance agricultural productivity by
identifying crop types and their spatial distribution. By integrating advanced clustering
techniques, the system contributes to more efficient farm management and data-driven
decision-making.

This section delves into the design, objectives, data processing, dimensionality reduction,
clustering model training, validation, and integration of the Crop Clustering Module into the
overall workflow.

4.3 Objective

The objective of the Crop Clustering Module is to use machine learning techniques to
accurately group crop areas based on NDVI data from satellite imagery. The system focuses
on clustering crop types by processing image data through a sequence of autoencoder-based
feature extraction, PCA-based dimensionality reduction, and K-Means clustering. This
allows for the identification of distinct crop types within large agricultural regions, thereby
providing actionable insights for crop distribution and land management.

Traditional methods of analyzing crop distribution often require extensive manual labor and
may not comprehensively account for spatial variations and large datasets. Our approach
automates the clustering process using NDVI data, making it scalable and reliable while
reducing the need for manual intervention. By clustering crop types, the system helps
identify spatial distributions and patterns, enabling agricultural researchers and planners
to make informed decisions regarding crop yield, rotation, and resource allocation.

In addition to providing cluster-based crop insights, the module iteratively optimizes the
clustering process and validates results using the Adjusted Rand Index (ARI) score. The ARI
score offers a quantitative measure of clustering performance, ensuring that the predicted
clusters align well with reference data. This iterative approach and validation step
guarantee high-quality clustering and robust crop-type categorization.
4.4Inputs and Data Processing

To accurately cluster crop types based on satellite imagery, the Crop Clustering Module
processes NDVI (Normalized Difference Vegetation Index) data from Sentinel-2 satellite
images. The NDVI imagery serves as a primary input, providing critical information about
vegetation health, density, and spatial distribution within the target region. The data
preprocessing and input handling steps are crucial for ensuring high-quality clustering and
analysis results.

Data Preprocessing Steps:

• NDVI Calculation: Sentinel-2 satellite images are processed to calculate NDVI values,
which indicate vegetation health by measuring the difference between near-infrared
(NIR) and red light reflectance.
• Spatial Resolution Standardization: The NDVI imagery is standardized to a
consistent spatial resolution to ensure uniformity in clustering operations.
• Noise Reduction and Filtering: The input images are preprocessed to remove noise
and irrelevant data, enhancing the accuracy of feature extraction and clustering.
• Feature Extraction with Autoencoders: Autoencoders are employed to extract
relevant features from the NDVI data, capturing complex patterns and reducing
dimensionality in a meaningful way.
• Dimensionality Reduction using PCA: Principal Component Analysis (PCA) is
applied to further reduce dimensionality while retaining the most significant features,
streamlining the clustering process.
• K-Means Clustering: The processed data is clustered using K-Means, identifying
distinct crop types and distributions. This step iterates over a defined number of
clusters to achieve optimal grouping.

By using NDVI imagery and advanced data processing techniques, the module delivers a
robust mechanism for clustering crop types. The approach leverages spatial and spectral
data to identify crop patterns, assisting in agricultural planning, resource allocation, and
crop yield analysis. This data-driven framework provides a comprehensive view of crop
distribution, optimizing the efficiency and accuracy of agricultural insights.

4.5 Data Normalization and Feature Engineering

Once the NDVI data is collected, several preprocessing and feature engineering steps are
performed to ensure it is properly structured and optimized for clustering analysis and
downstream prediction models. These steps help enhance the quality of input data and
improve model accuracy by creating standardized and relevant features for processing.

1. Data Scaling/Normalization

• Scaling NDVI Values: All NDVI values are scaled to a consistent range (typically
between 0 and 1) to ensure that clustering models, like K-Means, are not
disproportionately influenced by variations in the scale of input values. For example,
differences in NDVI values between regions can be normalized to bring uniformity to
data distribution.
• Normalization Benefits: Scaling NDVI data minimizes bias, ensuring no specific
range of values dominates during the model’s decision-making process.

2. Feature Extraction and Transformation

• Spectral Indices: In addition to NDVI, other spectral indices derived from the raw
satellite data (e.g., EVI for Enhanced Vegetation Index) may be computed to capture
more detailed vegetation characteristics.
• Dimensionality Reduction Techniques: To streamline data for clustering and
ensure computational efficiency, techniques like Principal Component Analysis (PCA)
are applied. This reduces redundant information while preserving essential patterns
relevant to the clustering task.
• Autoencoders for Feature Learning: Autoencoders can capture non-linear patterns
and compress data in a meaningful manner, extracting higher-order features from
NDVI values for improved clustering.

3. Outlier Handling

• Identification and Management: Outliers in NDVI data, which may occur due to
cloud cover, shadows, or incorrect data readings, are identified using statistical
methods such as the Interquartile Range (IQR). These outliers can significantly distort
cluster boundaries if left unhandled.
• Removal and Correction: Depending on the data distribution, outliers are either
removed or capped at threshold values to maintain a clean and accurate dataset for
clustering. In cases of missing data due to anomalies, interpolation or other statistical
techniques may be applied.

4.6 Model Training and Evaluation

The Crop Clustering Model is trained on NDVI datasets collected from historical satellite
imagery, representing various regions and crop types. This model aims to accurately
segment and classify crop clusters based on spectral data to provide insights into crop
distribution and health.

Model Selection and Justification

1. K-Means Clustering:

• Description: K-Means partitions NDVI data into distinct clusters, grouping

areas with similar vegetation health and patterns. This technique works well for
unsupervised segmentation of homogeneous regions within agricultural fields.
• Strengths: K-Means is efficient and performs well when dealing with spatial
data like NDVI, making it ideal for initial segmentation.
2. Mini-Batch K-Means:

• Description: An optimized version of K-Means that processes data in mini-

batches, making it suitable for large-scale NDVI datasets by reducing memory
usage and computational load.
• Strengths: This variant allows for scalable clustering without compromising
accuracy, crucial for processing high-resolution NDVI imagery over large areas.
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise):

• Description: This model identifies clusters based on density, enabling the

detection of arbitrarily shaped clusters and differentiating between dense crop
regions and noise (e.g., barren areas).
• Strengths: DBSCAN's ability to identify noise and separate dense regions is
valuable for segmenting mixed vegetation patterns found in NDVI imagery.

• Support Vector Machine (SVM) Classifier:

• Description: SVM is a supervised learning model that can be employed for

classification tasks, and it is often adapted for clustering and segmentation tasks using
techniques such as clustering combined with classification boundaries in high-
dimensional NDVI feature space.
• Strengths: SVM is effective when distinguishing between different crop regions with
complex or non-linear boundaries. By applying a kernel trick (such as radial basis
functions), SVM can separate complex patterns in NDVI data, enhancing the ability to
identify and distinguish crop clusters.
• Applications: SVM's ability to establish clear decision boundaries makes it a strong
candidate for classifying distinct crop types or regions based on NDVI-derived
features when ground truth data or initial labels are available. It can be combined with
clustering models to refine and validate predictions, improving the accuracy of crop
segmentation tasks.
4.7 Evaluation Metric

• Adjusted Rand Index (ARI): ARI is used to validate the clustering accuracy by
comparing predicted cluster assignments to known labels or expected patterns,
offering insights into the quality of segmentation and clustering achieved by the
models. A higher ARI score indicates better model performance in distinguishing crop
clusters based on NDVI data.

Figure (Model Accuracy Diagram)

4.8 Model Training and Evaluation

Once the model is trained, its performance is evaluated using the Adjusted Rand Index
(ARI) score, a metric that is particularly useful for evaluating the clustering quality by
comparing the similarity of the predicted clusters with the true clusters. Here's a refined
description of how you are evaluating the model:

1. ARI Score for First 10 Iterations:

• Definition: During the first 10 iterations, the ARI score is computed to monitor
the model's clustering performance. The ARI score measures how well the
model's predicted clusters match the true clusters. It ranges from -1 to 1, with 1
indicating perfect clustering and 0 indicating random clustering.
• First 10 Iterations Performance: After the 10th iteration, the ARI score was
0.83, indicating a high level of clustering accuracy. This score shows that the
model is effectively learning to group similar data points together.
• Importance: The ARI score improvement from iteration to iteration helps to
track the model’s progress, providing insights into how the clustering accuracy
is evolving over time.
2. Verification with Another Dataset:

• Definition: To further validate the model's generalization capability, the ARI

score is computed using a different dataset that was not part of the training
process. This step is crucial for testing whether the model has learned
generalizable patterns or is overfitting to the training data.
• Performance on New Dataset: After training, the model was applied to a new
dataset, and the ARI score was calculated again, resulting in an ARI score of
0.87. This indicates that the model performs well not only on the original
dataset but also on unseen data, demonstrating its ability to generalize.
• Importance: A higher ARI score on a new dataset suggests that the model is not
overfitting (where it would perform poorly on new data) and is instead learning
robust patterns that hold across different data sets.
3. Model Evaluation for Overfitting and Underfitting:

• Overfitting and Underfitting Explanation:

• Overfitting occurs when a model learns the training data too well,
including noise and outliers, leading to poor performance on new, unseen
data.
• Underfitting occurs when the model is too simplistic and fails to capture
the underlying patterns in the data, leading to poor performance on both
the training and new data.
• Model Behavior:
• The ARI score of 0.83 after 10 iterations and 0.87 on a new dataset
suggest that the model is neither overfitting nor underfitting.
• The improvement in the ARI score from 0.83 to 0.87 between the training
and testing phases indicates that the model has generalized well and is
performing consistently across both datasets.
• Since the ARI score is improving and showing good performance on both
the original and new dataset, the model seems to be in a healthy state,
capturing meaningful patterns without memorizing the training data or
being too simplistic.
4. ARI Score Improvement Diagram:

• Visualization: A diagram can be included to show how the ARI score evolves
over the first 10 iterations. On the X-axis, you would plot the number of
iterations (from 1 to 10), and on the Y-axis, the ARI score.
• The line representing the ARI score will start at a lower value and gradually
increase, showing the improvement as the model fine-tunes its clustering.
• Final ARI Score: After the 10th iteration, the final ARI score reaches 0.83, and
after applying the model to a new dataset, it rises to 0.87, indicating that the
model is able to generalize and improve upon its performance during training.

By monitoring the ARI score at each stage of training and testing, and comparing
performance on different datasets, you confirm that the model is neither overfitting nor
underfitting. The stable improvement in ARI scores suggests that the model is learning
effectively and is suitable for deployment.
4.9 Model Layers and Hyperparameters
This section describes the architecture and key components of the autoencoder model used for plant disease
prediction. Autoencoders are a type of unsupervised learning model that learns to encode the input data into a
compressed form (latent space) and then decodes it back to its original form. In this project, it is used for
anomaly detection, with the idea that diseases cause abnormalities in plant images.

1. Model Architecture:

• Input Layer:
• Shape: (128, 128, 3) — The model takes input images of shape 128x128 pixels
with 3 color channels (RGB).
• Encoding Layers (Encoder):
• Conv2D Layer:
• Number of Filters: 32
• Kernel Size: (3, 3)
• Activation: ReLU
• This layer applies convolutional filters and captures important features
of the image, while reducing spatial dimensions.
• MaxPooling2D Layer:
• Pool Size: (2, 2)
• Reduces spatial dimensions to downsample the feature map, effectively
compressing the data.
• Additional Conv2D and MaxPooling2D layers may follow to further extract
and compress features.
• Latent Space:
• A fully connected layer with a smaller dimension represents the compressed
encoding of the image.
• Decoding Layers (Decoder):
• UpSampling2D Layer:
• Size: (2, 2)
• This layer increases the spatial dimensions of the encoded features to
begin reconstruction.
• Conv2DTranspose (Deconvolutional) Layers:
• These layers reconstruct the image using transpose convolutions.
• Activation: Sigmoid
• Sigmoid is used in the decoder to ensure pixel values are between 0 and
1, which is typical for image data.
• Output Layer:
• The output is a reconstructed image of the same shape as the input (128, 128,
3), using Sigmoid activation to match the range of the input data.
2. Activation Functions:

• ReLU (Rectified Linear Unit):

• Applied in the encoding layers to introduce non-linearity and allow the model
to learn complex patterns.
• Sigmoid:
• Applied in the decoding layers to ensure the output values are between 0 and
1, suitable for image pixel val
Chapter 5
Results And Conclusion

5.1 Model Performance Evaluation for NDVI Crop Yield

Prediction

5.1.1 Training and Fine-Tuning the NDVI-Based Crop Yield Prediction Model

The NDVI Crop Yield Prediction Model was built using autoencoders for
feature extraction and further processing with other machine learning models.
The process was divided into two phases: initial training without fine-tuning
and fine-tuning to enhance prediction accuracy.

• Phase 1: Initial Training

1. Model Configuration: The autoencoder was trained on NDVI data

to extract meaningful features representing crop health and
conditions. Initially, the encoding layers used ReLU activation for
capturing non-linear relationships in NDVI data, while the decoding
layers used sigmoid activation to reconstruct the feature map.
2. Optimizer and Learning Rate: The Adam optimizer with a
learning rate of 0.001 was used, ensuring efficient learning of
patterns from the dataset.

43
3. Early Stopping: Early stopping was integrated to halt training if
the validation loss did not improve, avoiding overfitting.
• Phase 2: Fine-Tuning

1. Unfreezing Layers: During fine-tuning, the last few layers were

unfrozen to allow further gradient updates, focusing on refining
crop yield-related features.
2. Learning Rate Adjustment: The learning rate was lowered to 1e-
5 during fine-tuning to prevent drastic changes to already learned
features while enhancing the model's ability to predict crop yields.
3. Extended Training: An additional 10 epochs were added to allow
the model to adapt more specifically to crop yield patterns and
anomalies observed in NDVI data.

5.2 Evaluation Results

After training and fine-tuning, the model was evaluated using the Adjusted
Rand Index (ARI) as the primary metric for clustering accuracy and
consistency.

• Validation ARI: The initial ARI after 10 iterations was 0.83, indicating
good consistency in clustering NDVI data related to crop health.
• Cross-Dataset Evaluation: Upon evaluating with a second dataset, the
ARI improved to 0.87, showing robust generalization to different crop
regions and conditions.
44
Evaluation Metrics:

• ARI Scores: The ARI scores were calculated to measure the similarity of
the predicted clusters with the ground truth, ensuring the model was
neither overfitting nor underfitting.
• Confusion Matrix: A confusion matrix was plotted to visualize the
misclassification between crop health categories, offering insights into
areas where the model could be improved.
• Performance Stability: The performance remained consistent between
datasets, further confirming that the model had not overfitted to the
initial training data.

45
5.2.1 Training History Analysis

The model's learning progress was analyzed through accuracy and loss plots
over training epochs:

• Accuracy Plot: Both training and validation accuracy showed a steady

increase, confirming effective learning. The model did not show
significant divergence, which is a sign of stable learning and a lack of
overfitting.
• Loss Plot: Both training and validation loss decreased consistently,
supporting the model’s ability to generalize well without overfitting.

5.2.2 Crop Yield Prediction Results

The final model was evaluated on the test set, which involved comparing
predicted crop yields with actual recorded yields from the selected region.

• Model Performance:
• ARI Score for Final Prediction: The final ARI score reached 0.87
after fine-tuning, showcasing the model's excellent ability to
identify the crop areas accurately from NDVI images.
• Prediction Accuracy: Crop yield predictions closely aligned with
historical yield data, with an accuracy of around 90%. This
indicates a high-quality model capable of providing actionable
insights for crop yield forecasting.

46
1. The Adjusted Rand Index (ARI) is a measure used to
evaluate the similarity between two data clusterings,
while accounting for chance. It compares how much the
predicted clustering (from your model) agrees with the

47
true or ground truth clustering (real-world
categorization).

Key Points:

1. Rand Index: The basic Rand Index measures the percentage of pairwise
comparisons between data points that are either in the same cluster or in
different clusters, between two clusterings. It produces values between 0
and 1, where:

• 1 means perfect agreement (clusters match exactly).

• 0 means random clustering with no agreement.
2. Adjusted Rand Index:

• The ARI is a corrected version of the Rand Index that adjusts for the
chance grouping of elements. It accounts for the fact that some
agreements could have happened randomly, especially when there
are many clusters or when the data is very large.
• ARI ranges from -1 to 1:
• 1 means perfect match between the predicted and true
clusters.
• 0 means the clustering result is no better than random
chance.
• Negative values indicate that the clustering is worse than
random chance.

48
Formula:

The ARI can be calculated using the following formula:

ARI=RI−E[RI]max(RI)−E[RI]ARI = \frac{RI - E[RI]}{max(RI) -

E[RI]}ARI=max(RI)−E[RI]RI−E[RI]

Where:

• RI: The Rand Index, which measures the proportion of pairwise

agreements.
• E[RI]: The expected Rand Index under random clustering.
• max(RI): The maximum value that the Rand Index can take.

Explanation:

• E[RI] is the expected Rand Index for random clustering, so the ARI
normalizes the Rand Index by adjusting for this randomness. This
correction allows the ARI to reflect how much better your clustering is
compared to random groupings.

• Interpretation:

• A higher ARI indicates that the predicted clustering is more similar

to the true clustering.
• A negative ARI suggests that the clustering is worse than random
clustering.
49
Example:

• ARI = 1: Your predicted clusters perfectly match the true clusters.

• ARI = 0.85: Your clustering is quite similar to the ground truth, but not
perfect. There may be some misclassifications.
• ARI = -0.1: Your clustering is worse than random; this could happen if
the model is poorly predicting the clusters.

In summary, ARI is a powerful metric to evaluate the quality and accuracy of

cluster assignments in your machine learning or data science tasks, particularly
when you are working with unsupervised learning and want to compare your
model's clustering with a known true classification.

Key Achievements
The NDVI-based Crop Yield Prediction Platform has achieved significant milestones by
providing data-driven insights to optimize crop management and health monitoring. The
platform’s accomplishments include:

• Data-Driven Crop Yield Estimation: By leveraging NDVI (Normalized Difference

Vegetation Index) data, the platform provides accurate crop yield predictions tailored to
environmental conditions. These insights enable farmers to make informed decisions,
optimizing crop productivity and sustainability based on satellite imagery and ground
data.

• Effective Crop Monitoring and Disease Detection: Using NDVI data and machine
learning models, the platform allows for early detection of potential issues such as water
stress, pest infestations, or disease outbreaks, enabling timely intervention to protect
crops and maximize yield.

• User-Centric Interface: The platform is designed to be intuitive and accessible for a

wide audience, from small-scale farmers to large agricultural enterprises. It allows users
50
to visualize NDVI maps, monitor crop health, and receive actionable insights, ensuring a
seamless user experience across different levels of expertise.

5.2.3 Limitations

While the NDVI-based Crop Yield Prediction Platform offers substantial value,
there are areas that could be improved to increase its impact and accuracy:

• Limited NDVI Data Coverage: The current system primarily uses NDVI
data from a few selected satellite sources. Expanding the geographic coverage
and incorporating data from other satellite systems (e.g., Landsat, MODIS)
would enhance the model's applicability to a broader range of regions and
farming contexts.

• Static Data Inputs: The platform’s crop yield predictions and disease
detection capabilities rely heavily on historical NDVI data. Integrating real-
time satellite data or weather APIs would allow the system to adapt to
changing conditions, providing more accurate and dynamic crop
recommendations and health monitoring in response to unforeseen weather
events.

5.2.4 Future Work

There are several avenues for enhancing the NDVI-based Crop Yield Prediction
Platform to increase its effectiveness, accuracy, and usability:
51
• Expanding NDVI Data Sources: Incorporating a wider range of satellite
data, including real-time imagery, would allow for more precise crop
monitoring across various geographic locations and environmental conditions.
This would help farmers in different regions benefit from tailored insights for
crop yield prediction and health management.

• Real-Time Environmental Data Integration: Integrating real-time data

from IoT sensors (e.g., soil moisture, temperature) and weather stations would
allow the platform to dynamically adjust its crop yield predictions and disease
detection, improving accuracy and providing actionable insights in near real-
time.

• Mobile Application Development: Creating a mobile app for the platform

would enhance accessibility, especially for farmers in remote or rural areas
with limited access to computers. The mobile app could offer users on-the-go
access to crop health data, yield predictions, and disease alerts, and it could
support offline functionality for collecting ground data even without internet
connectivity.

• Multi-Language Support: To broaden the platform’s reach, integrating

multi-language support would make it more accessible to farmers across
different linguistic regions. Providing translations for key functionalities
would ensure that farmers in diverse agricultural areas can fully benefit from
the platform’s insights, regardless of their native language.

52
These future enhancements would significantly improve the platform’s usability,
scalability, and accuracy, making it a powerful tool for farmers and agricultural
practitioners worldwide.

Appendices
Appendix A: Project Code
A.1: Python Code for Machine Learning Model

53
54
55
56
Appendix B: Testing and Evaluation Results

57
ARI Scores

1 0.57
2 0.62
3 0.67
4 0.74
5 0.79
6 0.88
7 0.93
8 0.96
9 0.98
10 0.98

Final Ari Score 0.90

Sillhoute Score 0.85

References
1. Books

• Book Title: Machine Learning Yearning

Author(s): Andrew Ng
58
Publisher: AI Publishing
Year: 2018
Summary: This book was referred to for understanding the fundamentals
of machine learning algorithms, specifically for model selection and
optimization. The insights helped in selecting appropriate algorithms for
crop prediction and recommendation.

• Book Title: Hands-On Machine Learning with Scikit-Learn, Keras, and

TensorFlow
Author(s): Aurélien Géron
Publisher: O'Reilly Media
Year: 2019
Summary: This book was used to understand how to implement machine
learning models using Python libraries like Scikit-Learn and TensorFlow. It
helped in building and training the predictive model for your Smart
Agriculture project.

3. Online Courses and Tutorials

• Course Title: Machine Learning Specialization

Platform: Coursera
Instructor: Andrew Ng
Date Accessed: September 2024
Summary: This course provided fundamental knowledge on machine
learning algorithms and model evaluation techniques that were directly
applied in your Smart Agriculture project.

59
Personal Details

Name : Suryansh Pratap Singh

Enrollment Number: 221B403

Branch: Computer Science and Engineering(CSE)

Email Id: [email protected]

Mob No.: 8004881036

Name : Rahul Narendra Sharma

Enrollment Number: 221B291

Branch: Computer Science and Engineering(CSE)

Email Id: [email protected]

Mob No.: 8329363062

Name : Sajal Korde

Enrollment Number: 221B319

Branch: Computer Science and Engineering(CSE)

Email Id: [email protected]

Mob No.: 7471116905

iot ttt22
No ratings yet
iot ttt22
13 pages
Epics 148
No ratings yet
Epics 148
38 pages
1822 B.E Cse Batchno 46
No ratings yet
1822 B.E Cse Batchno 46
79 pages
Seminar Report Te It
No ratings yet
Seminar Report Te It
18 pages
Final Paper PDF
No ratings yet
Final Paper PDF
5 pages
Project Report
No ratings yet
Project Report
62 pages
Report 1
No ratings yet
Report 1
27 pages
IOT Report
No ratings yet
IOT Report
20 pages
AGRISCENSE FINAL-1
No ratings yet
AGRISCENSE FINAL-1
70 pages
All
No ratings yet
All
62 pages
Crop
No ratings yet
Crop
63 pages
CSE_pre_crop_01-1
No ratings yet
CSE_pre_crop_01-1
13 pages
Smart Farming Report
No ratings yet
Smart Farming Report
67 pages
21951a6675
No ratings yet
21951a6675
36 pages
Final Eeee
No ratings yet
Final Eeee
45 pages
final Main Predictive Crop Analytics
No ratings yet
final Main Predictive Crop Analytics
105 pages
Crop and Fertilizer Recommendation System
No ratings yet
Crop and Fertilizer Recommendation System
54 pages
Final Project Report
No ratings yet
Final Project Report
37 pages
Smart Crop Advisor System using Iot and ML 2
No ratings yet
Smart Crop Advisor System using Iot and ML 2
11 pages
Paper 4
No ratings yet
Paper 4
7 pages
1234_report
No ratings yet
1234_report
37 pages
Phase 1
No ratings yet
Phase 1
15 pages
BATCH_1 (1)
No ratings yet
BATCH_1 (1)
36 pages
final main doc
No ratings yet
final main doc
59 pages
CSP Final
No ratings yet
CSP Final
31 pages
111111111111 Full Paraphrased
No ratings yet
111111111111 Full Paraphrased
26 pages
ppsd-1702484118
No ratings yet
ppsd-1702484118
6 pages
PHASE2REV1[2]
No ratings yet
PHASE2REV1[2]
18 pages
Soil Classification Using Machine Learning and Crop Suggestions
No ratings yet
Soil Classification Using Machine Learning and Crop Suggestions
7 pages
Final LBP Report
No ratings yet
Final LBP Report
24 pages
Chapter b Tech
No ratings yet
Chapter b Tech
41 pages
111111111111
No ratings yet
111111111111
36 pages
COTTON Pest Management Background Literature
No ratings yet
COTTON Pest Management Background Literature
73 pages
Agriculture Crop Yield Prediction Using Machine Learning
No ratings yet
Agriculture Crop Yield Prediction Using Machine Learning
8 pages
MACHINE_LEARNING3[1]
No ratings yet
MACHINE_LEARNING3[1]
34 pages
Sat - 46.Pdf - Crop Yeild Prediction and Crop Recommendation Based On Machine Learning
No ratings yet
Sat - 46.Pdf - Crop Yeild Prediction and Crop Recommendation Based On Machine Learning
11 pages
6th Sem Mini-Project Report
No ratings yet
6th Sem Mini-Project Report
35 pages
Final Project Black 00
No ratings yet
Final Project Black 00
33 pages
1234 Report
No ratings yet
1234 Report
37 pages
Report PBL
No ratings yet
Report PBL
34 pages
Cat 1 Review
No ratings yet
Cat 1 Review
13 pages
Ilovepdf Merged (1)
No ratings yet
Ilovepdf Merged (1)
58 pages
CORRECTION DOCUMENT
No ratings yet
CORRECTION DOCUMENT
79 pages
Crop Prediction Using PySpark
No ratings yet
Crop Prediction Using PySpark
7 pages
project
No ratings yet
project
30 pages
AI DRIVEN SOIL MONITORING AND CROP RECOMMENDATION USING MACHINE LEARNING ALGORITHM
No ratings yet
AI DRIVEN SOIL MONITORING AND CROP RECOMMENDATION USING MACHINE LEARNING ALGORITHM
8 pages
Major Project Presentation
No ratings yet
Major Project Presentation
43 pages
TE It Seminar PPT 2024-25
No ratings yet
TE It Seminar PPT 2024-25
13 pages
crop ppt ml
No ratings yet
crop ppt ml
9 pages
crop ppt ml2
No ratings yet
crop ppt ml2
10 pages
venu seminar report 25
No ratings yet
venu seminar report 25
19 pages
Sem 7 Reportt
No ratings yet
Sem 7 Reportt
40 pages
crop 7
No ratings yet
crop 7
5 pages
Final Report
No ratings yet
Final Report
68 pages
Research Pepar
No ratings yet
Research Pepar
4 pages
vinayan
No ratings yet
vinayan
30 pages
Advanced Analytics of Agricultural Datasets
From Everand
Advanced Analytics of Agricultural Datasets
Dr. Zemelak Goraga
No ratings yet
Use Cases of AI and ML in Agriculture: Smart Project Ideas
From Everand
Use Cases of AI and ML in Agriculture: Smart Project Ideas
Zemelak Goraga
No ratings yet
Agricultural Informatics: Technology in Farming
From Everand
Agricultural Informatics: Technology in Farming
Chetanaanand Kaul
No ratings yet
Shaping Sustainable Agrifood Futures: Pre-Emerging and Emerging Technologies and Innovations for Impact: An Extended Global Foresight Report with Regional and Stakeholders' Insights
From Everand
Shaping Sustainable Agrifood Futures: Pre-Emerging and Emerging Technologies and Innovations for Impact: An Extended Global Foresight Report with Regional and Stakeholders' Insights
Food and Agriculture Organization of the United Nations
No ratings yet
lab 9
No ratings yet
lab 9
3 pages
Emotion Analyser Report-1
No ratings yet
Emotion Analyser Report-1
35 pages
oss_report
No ratings yet
oss_report
1 page
Level of Languages
No ratings yet
Level of Languages
10 pages
Applications of Remote Sensing and GIS
No ratings yet
Applications of Remote Sensing and GIS
6 pages
PEC - CS801E - PECIT801D Internet of Things
No ratings yet
PEC - CS801E - PECIT801D Internet of Things
2 pages
126136
No ratings yet
126136
55 pages
Remote Sensing: Remote Sensing in Agriculture-Accomplishments, Limitations, and Opportunities
No ratings yet
Remote Sensing: Remote Sensing in Agriculture-Accomplishments, Limitations, and Opportunities
29 pages
An IoT Based Low Cost Autonomous Scalable Hydroponics System4 2023-07-21 06-29-42
No ratings yet
An IoT Based Low Cost Autonomous Scalable Hydroponics System4 2023-07-21 06-29-42
8 pages
Case Study 1 - Are Farms Becoming Digital Firms
100% (1)
Case Study 1 - Are Farms Becoming Digital Firms
4 pages
Data 08 00112
No ratings yet
Data 08 00112
10 pages
Jayant 2022
No ratings yet
Jayant 2022
8 pages
Development and Implementation of Intelligent Agriculture
No ratings yet
Development and Implementation of Intelligent Agriculture
2 pages
Artificial Intelligence in The Agri-Food System: Rethinking Sustainable Business Models in The COVID-19 Scenario
No ratings yet
Artificial Intelligence in The Agri-Food System: Rethinking Sustainable Business Models in The COVID-19 Scenario
12 pages
Lesson Plan 2
No ratings yet
Lesson Plan 2
14 pages
Fabrication of a Weeding Equipment Using IoT Sensor and Camera in a Small Boat
No ratings yet
Fabrication of a Weeding Equipment Using IoT Sensor and Camera in a Small Boat
12 pages
Agricultural Drone
No ratings yet
Agricultural Drone
15 pages
Module 4
No ratings yet
Module 4
21 pages
Vels Report
No ratings yet
Vels Report
30 pages
Robotics Introduction
100% (1)
Robotics Introduction
23 pages
Development of An Unmanned Ground Vehicle For Seed Planting
No ratings yet
Development of An Unmanned Ground Vehicle For Seed Planting
12 pages
Thesis 2[1]Final
No ratings yet
Thesis 2[1]Final
32 pages
Arduino_based_irrigation_monitoring_syst
No ratings yet
Arduino_based_irrigation_monitoring_syst
8 pages
CropRecommendationSystem PDF
No ratings yet
CropRecommendationSystem PDF
4 pages
B Sc. (Botany) - 08062024
No ratings yet
B Sc. (Botany) - 08062024
57 pages
Lab Ex1
No ratings yet
Lab Ex1
4 pages
Big Data Analytics in Agriculture
No ratings yet
Big Data Analytics in Agriculture
9 pages
Automation of The Analysis of Soil Properties From An Electrical Conductivity Sensor Using Arduino Microcontroller
No ratings yet
Automation of The Analysis of Soil Properties From An Electrical Conductivity Sensor Using Arduino Microcontroller
8 pages
List 01
No ratings yet
List 01
30 pages
Smart Agriculture System Project
No ratings yet
Smart Agriculture System Project
105 pages
Problem Definition and Design Thinking
100% (1)
Problem Definition and Design Thinking
7 pages
PRECISION AGRICULTURE
No ratings yet
PRECISION AGRICULTURE
18 pages
Hi-Tech Horticulture-Definition, Importance and Scope
No ratings yet
Hi-Tech Horticulture-Definition, Importance and Scope
14 pages
Oil Palm Precission Agriculture
No ratings yet
Oil Palm Precission Agriculture
13 pages