0% found this document useful (0 votes)
6 views

Viraj_Project_Documentation

The project report focuses on analyzing agriculture crop production in India, utilizing machine learning and data visualization techniques to uncover trends and predict future productivity. It includes implementation details, experimental setups, and results, highlighting the importance of understanding crop dynamics for food security. The project aims to provide insights that can inform agricultural policy and enhance productivity sustainability.

Uploaded by

shaikhaaqif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Viraj_Project_Documentation

The project report focuses on analyzing agriculture crop production in India, utilizing machine learning and data visualization techniques to uncover trends and predict future productivity. It includes implementation details, experimental setups, and results, highlighting the importance of understanding crop dynamics for food security. The project aims to provide insights that can inform agricultural policy and enhance productivity sustainability.

Uploaded by

shaikhaaqif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

A

Project Report on
India Agriculture Crop
Production Analysis
Submitted to
UNIVERSITY OF MUMBAI
In the partial fulfillment of the degree
Of Masters of Computer Science
Project By:
Mr. Viraj Vasudev Pawasakar
Exam Seat No:
1183893
Under the Guidance of
Mrs. Rupali Agavekar
Navkokan Education Society’s
D.B.J College, Chiplun
(2023-2024)
1
Navkokan Education Society’s

D.B.J. COLLEGE, CHIPLUN


NAAC Reaccredited Grade ‘A’ (CGPA 3.15)
DEPARTMENT OF COMPUTER SCIENCE

CERTIFICATE
This is to certify that Mr. Viraj Vasudev Pawasakar of
MSc. Part-II (Semester IV) Computer Science has
successfully completed the Project in Machine Learning
and has submitted the same to my satisfaction during the
academic year 2023-24 towards partial fulfillment of
MSc. Part-II (Semester IV) Computer Science,
University of Mumbai.

Date:
Guide Signature:

INCHARGE
2
Department of Computer Science
Acknowledgement

It’s my great pleasure to take opportunity and


sincerely thanks all those who have showed me the way to
successful project and helped me a lot during the
completion of my project.
I greatly thank my Project Guide Mrs. Rupali
Agavekar without whom the completion of this project
couldn’t have been Possible.
My sincerely thanks to respected Head of Computer
Science Department Mr. S. J. Nalawade for providing all
the facilities including availability of Computer Lab. I
take this opportunity to express my deep gratitude
towards all the members of the Computer Science
Department, for helping me in the completion of the
project.
My special thanks to my parents, my friends and all
those people who have encouraged me, helped me to
complete this project proposal successfully in time

Mr. Viraj Vasudev Pawasakar


M.Sc. Part-II (Computer Science)
3
Table of Content
Sr. No Title Page No
1. Topic 5
2. Implementation details 6
3. Experimental setups and results 11
4. Analysis of the results 16
5. Conclusion 48
6. Future enhancement 49

4
India Agriculture Crop
Production Analysis
Mr. Viraj Vasudev Pawasakar

A dissertation submitted in partial fulfillment of D.B.J


College (Chiplun) for the degree of MSC in Computer Science
(Machine Learning). July 2024

5
2. Implementation Details

This project is focused on analyzing the agriculture crop


production in India. The aim of this analysis is to provide
insights into crop production trends, identify high-performing
crops and districts, and utilize various data visualization and
machine learning techniques to understand and predict
agricultural productivity.

Project Overview
India is one of the largest agricultural producers in the world,
and understanding the dynamics of crop production is crucial
for ensuring food security and optimizing resource allocation.
This project leverages historical data on crop production to
derive meaningful insights.

Aim of the Project


The primary aim of this project is to analyze and visualize
crop production data in India to uncover patterns and trends
that can inform agricultural policy and decision-making. By
identifying the factors that contribute to high crop yields,
stakeholders can develop strategies to enhance productivity
and sustainability in Indian agriculture.

6
Libraries and Frameworks Used

 Streamlit:
Streamlit is a framework for creating web applications with
Python. It's used for building interactive and customizable
web-based interfaces for data analysis, machine learning,
and more.

 Pandas:
Pandas is a powerful data manipulation and analysis library.
It provides data structures like DataFrames and Series, which
are essential for handling structured data.

 NumPy:
NumPy is a fundamental package for numerical computing in
Python. It provides support for large, multi-dimensional
arrays and matrices, along with a collection of mathematical
functions to operate on these arrays.

 Matplotlib:
Matplotlib is a comprehensive library for creating static,
animated, and interactive visualizations in Python. pyplot is a
module in Matplotlib that provides a MATLAB-like interface
for plotting.

7
 Seaborn:
Seaborn is built on top of Matplotlib and provides a higher-
level interface for drawing attractive and informative
statistical graphics. It simplifies the process of creating
complex visualizations such as heatmaps, violin plots, and
more.

 scikit-learn:
Scikit-learn is a versatile machine learning library for Python.
It includes various tools for supervised and unsupervised
learning, such as regression, classification, clustering, and
dimensionality reduction. LinearRegression is a model class
for fitting linear regression models, and train_test_split is a
function for splitting data into training and testing sets. The
mean_squared_error is a function that calculates the mean
squared error between predicted values and actual values,
commonly used to evaluate regression models.

8
Implementation Steps

1. Setting up the Environment


Python and the necessary libraries installed. We can create a
virtual environment for our project and install the required
libraries.

2. Loading the Dataset


Load the Indian Agriculture Crop Production Data into a Pandas
DataFrame.

3. Data Overview and Preprocessing


Get an overview of the dataset and preprocess it as necessary.

4. Exploratory Data Analysis (EDA)


Analyze the data to understand trends and patterns. Using Data
Visualizations

5. Trend Analysis
Analyzes the trends in crop production to identify patterns and
seasonal variations.

6. Future Data Prediction using Linear Regression Model


Predicts future crop production based on historical data.

9
7. Correlation Analysis
Examines the relationships between different variables to
understand their interdependencies.

8. Seasonal Analysis
Analyzes the seasonal patterns in crop production to understand
the impact of seasons.

9. Linear Regression
Applied to predict future crop production based on historical
data.

10. Train-Test Split


Used to validate the performance of the predictive models.

11. Yield Prediction Model (Mean Squared Error) Evaluates


the accuracy of the yield prediction model using the Mean
Squared Error metric.

10
3. Experimental Setup and Results
Microsoft Visual Studio code:

Visual Studio Code is a source-code editor that can be used with a


variety of programming languages, including Java, JavaScript, Go,
Node.js, Python and C++. It is based on the Electron framework, which
is used to develop Node.js Web applications that run on the Blink layout
engine. Visual Studio Code employs the same editor component
(codenamed "Monaco") used in Azure DevOps(formerly called Visual
Studio Online and Visual Studio Team Services).

Instead of a project system, it allows users to open one or more


directories, which can then be saved in workspaces for future reuse.
This allows it to operate as a language- agnostic code editor for any
language. It supports a number of programming languages and a set of
features that differs per language. Unwanted files and folders can be
excluded from the project tree via the settings. Many Visual Studio
Code features are not exposed through menus or the user interface but
can be accessed via the command palette.

Visual Studio Code can be extended via extensions availablethrough a


central repository. This includes additions to the editor and language
support. A notable feature is the abilityto create extensions that add
support for new languages, themes, and debuggers, perform static code
analysis, and add code linters using the Language Server Protocol

11
CSV

A CSV (Comma-Separated Values) file is a plain text file that stores


tabular data in a simple format, making it easy to import and export data
between different applications. Each line in a CSV file corresponds to a
row in the table, with fields separated by commas. The first line
typically contains headers that describe the fields. CSV files are highly
portable and universally supported, allowing for seamless data exchange
across various platforms and software. Their simplicity also makes them
easy to create, read, and edit with any text editor, ensuring accessibility
and flexibility for data handling in projects.

CSV files are especially useful in data analysis and machine learning
projects where large datasets need to be processed efficiently. Their
straightforward structure allows for quick parsing and integration with
numerous data processing libraries in programming languages like
Python, R, and Java. For instance, in Python, libraries such as pandas
provide robust tools for reading, writing, and manipulating CSV data,
facilitating tasks like data cleaning, transformation, and visualization.
Furthermore, the simplicity of CSV files ensures minimal overhead and
compatibility issues, making them an ideal choice for both small-scale
data operations and large-scale data workflows in various domains.

12
Methodology

The methodology of this project involves several steps to


analyze Indian agriculture crop production and derive
meaningful insights as listed below:

Data Collection:
Gather data from reliable sources, including parameters like
crop type, year, area under cultivation, production, yield,
and weather conditions.
Obtain data in CSV format for easy storage and analysis.

Data Preprocessing:
 Data Cleaning: Address missing values, remove
duplicates, and correct inconsistencies.
 Data Transformation: Ensure correct data types and
create derived features as needed.

Exploratory Data Analysis (EDA):


 Conduct EDA to understand data distribution, identify
trends, and detect outliers.
 Use visualizations (histograms, box plots, scatter
plots, heatmaps) to explore variable relationships.

13
Correlation Analysis:
 Calculate correlation coefficients to evaluate relationships
between variables like rainfall, temperature, and crop
yield.
 Identify key factors significantly correlated with
crop production.

Predictive Modeling:
 Model Selection: Choose machine learning models (e.g.,
Linear Regression) for future crop production prediction.
 Model Training: Split data into training and testing sets,
then train the models.
 Model Evaluation: Use metrics such as Mean Squared
Error (MSE) to assess model accuracy.

14
Database Description

 Year: The year in which the data was recorded (e.g., 2018-19,
2019-20).
 Crop: The type of crop being analyzed (e.g., rice, wheat, maize).
 Area: The area under cultivation, typically measured in hectares.
 Production: The total production of the crop, usually measured
in tonnes.
 Yield: The yield of the crop, calculated as production per unit
area (e.g., tonnes per hectare).
 Geographical Location: Details about the location of cultivation,
including state, district, and village.

Fields DataTypes
State object
District object
Crop object
Year object
Season object
Area float64
Area Units object
Production float64
Production Units object
Yield float64

15
4. Analysis of the results
Code:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt import seaborn as sns
from sklearn.linear_model import
LinearRegression from sklearn.model_selection
import train_test_split from sklearn.metrics
import mean_squared_error

# Sample data for


illustration def
scroll_to_top():
scroll_to_top_js = """
<script>
window.scrollTo(0, 0);
</script>
"""
st.markdown(scroll_to_top_js, unsafe_allow_html=True)

def main():
scroll_to_top()

@st.cache_data
def
load_data():
data = pd.read_csv('India Agriculture Crop
Production.csv') return data

data = load_data()

# Custom CSS to make the sidebar collapsible


st.markdown(
"""
<style>
.css-1d391kg {
transition: margin-left 0.3s;
16
}
.css-1d391kg[data-expanded="false"] {

17
margin-left: -20rem;
}
.css-1d391kg[data-
expanded="true"] { margin-
left: 0;
}
</
style>
""",
unsafe_allow_html=True,
)

# Sidebar content
st.sidebar.title("Navigation")

if st.sidebar.button("Introduction"):
st.session_state.page = "Introduction"

if st.sidebar.button("Analysis of Data"):
st.session_state.page = "Analysis of Data"

if st.sidebar.button("Data Cleaning"):
st.session_state.page = "Data Cleaning"

if st.sidebar.button("Visual Analysis"):
st.session_state.page = "Visual Analysis"

if st.sidebar.button("Trend Analysis"):
st.session_state.page = "Trend Analysis"

if st.sidebar.button("Correlation Analysis"):
st.session_state.page = "Correlation
Analysis"

if st.sidebar.button("Seasonal Analysis"):
st.session_state.page = "Seasonal Analysis"

if st.sidebar.button("Yield Prediction Model"):


st.session_state.page = "Yield Prediction
Model"

18
# Initialize session state variables if they don't
exist if 'show_crop_production_years' not in
st.session_state:
st.session_state.show_crop_production_years = False

19
if 'show_crop_production_state' not in st.session_state:
st.session_state.show_crop_production_state = False
if 'show_area_cultivation_state' not in st.session_state:
st.session_state.show_area_cultivation_state = False
if 'show_share_area_cultivation_year' not in st.session_state:
st.session_state.show_share_area_cultivation_year = False
if 'show_production_state_year' not in st.session_state:
st.session_state.show_production_state_year = False
if 'show_production_crop_year' not in st.session_state:
st.session_state.show_production_crop_year = False
if 'show_selected_state_crop_production' not in st.session_state:
st.session_state.show_selected_state_crop_production = False
if 'show_selected_crop_production_top_states' not in
st.session_state:
st.session_state.show_selected_crop_production_top_states =
False
if 'show_total_production_rice_wheat' not in st.session_state:
st.session_state.show_total_production_rice_wheat = False
if 'show_heat_map_average_yield_by_state_year' not
in st.session_state:
st.session_state.show_heat_map_average_yield_by_state_year =
False if 'show_total_production' not in st.session_state:
st.session_state.show_total_production = False
if 'show_future_data_prediction' not in st.session_state:
st.session_state.show_future_data_prediction = False
if 'show_seasonal_analysis' not in st.session_state:
st.session_state.show_seasonal_analysis = False
if 'show_yield_prediction_model'not in
st.session_state:
st.session_state.show_yield_prediction_model =
False

if 'page' not in st.session_state:


st.session_state.page = "Introduction"

if st.session_state.page == "Introduction":
st.title("India Agriculture Crop Production
Analysis") st.write("""
## Welcome to the Introduction Tab
This project is focused on analyzing the agriculture crop
production in India. The aim of this analysis is to
20
provide insights into crop production trends, identify
high- performing crops and districts, and utilize various

21
data visualization and machine learning techniques to
understand and predict agricultural productivity.

### Project Overview


India is one of the largest agricultural producers in
the world, and understanding the dynamics of crop
production is crucial for ensuring food security and
optimizing resource allocation. This project leverages
historical data on crop production to derive meaningful
insights.

### Types of Analysis Conducted


- **Data Cleaning**: Prepares the data for
analysis by handling missing values, outliers, and
inconsistencies.
- **Crop-wise Analysis**: Identifies the top crops in
terms of production.
- **District-wise Analysis**: Identifies the top
districts in terms of crop production.
- **Year-wise Analysis**: Give the analysis of the data
by the usere year as main factor.

### Data Visualizations Used


- **Bar Charts**: Used to display the average
production of top crops and districts.
- **Line Charts**: Used to show trends in crop
production over time (if applicable).
- **Scatter Plots**: Used to examine relationships
between different variables (if applicable).
- **Heat Map**: Used to show a graphical
representation of data where values are depicted by color.

### Machine Learning Algorithms Used


- **Trend Analysis**: Analyzes the trends in crop
production to identify patterns and seasonal variations.
- **Future Data Prediction using Linear Regression
Model**: Predicts future crop production based on historical
data.
- **Correlation Analysis**: Examines the relationships
between different variables to understand their
interdependencies.
22
- **Seasonal Analysis**: Analyzes the seasonal
patterns in crop production to understand the impact of
seasons.

23
- **Linear Regression**: Applied to predict future
crop production based on historical data.
- **Train-Test Split**: Used to validate the
performance of the predictive models.
- **Yield Prediction Model (Mean Squared Error)**:
Evaluates the accuracy of the yield prediction model using the
Mean Squared Error metric.

### Aim of the Project


The primary aim of this project is to analyze and
visualize crop production data in India to uncover patterns
and trends that can inform agricultural policy and
decision- making. By identifying the factors that contribute
to high crop yields, stakeholders can develop strategies
to enhance productivity and sustainability in Indian
agriculture.

### Conclusion
This project provides a comprehensive analysis of
agricultural crop production in India, offering valuable
insights through data visualization and machine learning
techniques. We hope that this analysis will contribute
to a better understanding of India's agricultural
landscape and support efforts to improve crop production
efficiency and food security.
""")

elif st.session_state.page == "Analysis of


Data": st.title("Analysis of Data")
st.write("""
## Welcome to the Analysis of Data Tab
In this section, we will get some basic understanding of
the data used, columns present
in the data, the dataTypes in it
,etc. """)

# First Few Rows of the Dataset


st.write("### First Few Rows of the Dataset")
st.write("""

24
**First Few Rows of the Dataset**: This displays the first
few rows of the dataset to give an overview of the data
structure and contents.
""")
st.write(data.head())

# Summary statistics
st.write("### Summary
Statistics") st.write("""
**Summary Statistics**: Provides basic descriptive
statistics such as mean, standard deviation, min, max, and
quartiles for each numeric column. This helps in
understanding the distribution and spread of the data.
""")
st.write(data.describe())

# Data type information


st.write("### Data Types")
st.write("""
**Data Types**: Shows the data types of each column, which
is important to ensure that the data types are appropriate for
analysis (e.g., numeric columns should be of a numeric type).
""")
st.write(data.dtypes)

elif st.session_state.page == "Data Cleaning":


st.title("Data Cleaning")
st.write("""
## Welcome to the Data Cleaning Tab
In this secction, we will perform data cleaning to prepare
the dataset for analysis.
This involves examining the first few rows of the dataset,
summarizing statistics,
checking data types, and identifying any missing values.

Data cleaning is essential to ensure that our analyses


and machine learning models
are accurate and reliable.
""")

25
# Check for missing values
st.write("### Missing Values")
st.write("""
**Missing Values**: Lists the number of missing values in
each column. Identifying missing values is crucial as they need
to be handled before further analysis.
""")
missing_values = data.isnull().sum()
st.write(missing_values)

# Drop missing values


st.write("### Data after Dropping Missing
Values") st.write("""
**Data after Dropping Missing Values**: Displays the
dataset after removing rows with missing values. This step
ensures that subsequent analyses are not affected by
incomplete data.
""")
data_cleaned = data.dropna()
st.write(data_cleaned.head())

# Ensure all columns have compatible data types


for col in
data_cleaned.select_dtypes(include=['object']).columns:
try:
data_cleaned[col] =
pd.to_numeric(data_cleaned[col]) except ValueError:
data_cleaned[col] = data_cleaned[col].astype(str)

st.write("### Data Types after Conversion")


st.write("""
**Data Types after Conversion**: Displays the data types after
converting object columns to numeric or string types, ensuring
compatibility with Arrow.
""")
st.write(data_cleaned.dtypes)

# Summary by State and Crop


st.write("### Summary Statistics by State and
Crop") st.write("""
**Summary Statistics by State and Crop**: Provides
26
descriptive statistics for 'Area', 'Production', and 'Yield'
grouped by 'State'

27
and 'Crop'. This allows for a detailed analysis of these
metrics across different states and crops.
""")
summary_by_state_crop = data_cleaned.groupby(['State', 'Crop'])
[['Area', 'Production', 'Yield']].describe()
st.write(summary_by_state_crop)

elif st.session_state.page == "Visual Analysis":


st.title("Learning Data Analysis Through
Visualization") st.write("Welcome to the Learning
Data Analysis Through
Visualization tab.")

# Crop Production Over the Years


st.write("### Crop Production Over the Years")
if st.button("Show Crop Production Over the Years"):
st.session_state.show_crop_production_years = not
st.session_state.show_crop_production_years

if
st.session_state.show_crop_production_ye
ars: @st.cache_resource
def plot_crop_production_years():
plt.figure(figsize=(12, 6))
sns.lineplot(data=data, x='Year',
y='Production') plt.title('Crop Production
Over the Years') plt.xlabel('Year')
plt.ylabel('Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_years()

# Crop Production by State


st.write("### Crop Production by State")
if st.button("Show Crop Production by State"):
st.session_state.show_crop_production_state = not
st.session_state.show_crop_production_state

if
st.session_state.show_crop_production_st
ate: @st.cache_resource
28
def plot_crop_production_state():
plt.figure(figsize=(12, 8))

29
sns.barplot(data=data, x='State', y='Production',
estimator=sum)
plt.title('Crop Production by State')
plt.xlabel('State')
plt.ylabel('Total Production')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_crop_production_state()

# Area under Cultivation by State


st.write("### Area under Cultivation by State")
if st.button("Show Area under Cultivation by State"):
st.session_state.show_area_cultivation_state = not
st.session_state.show_area_cultivation_state

if st.session_state.show_area_cultivation_state:
year = st.selectbox("Select Year", data['Year'].unique())

@st.cache_resource
def plot_area_cultivation_state(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)

plt.figure(figsize=(12, 4))
plt.bar(area_by_state.index, area_by_state /
1e7) plt.title(f'Area under Cultivation by
State {year}
(million hect)')
plt.ylabel('Area under Cultivation (million hect)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_area_cultivation_state(year)

# Share of Area under Cultivation in Year


st.write("### Share of Area under Cultivation in Year")
if st.button("Show Share of Area under Cultivation in Year"):

30
st.session_state.show_share_area_cultivation_year = not
st.session_state.show_share_area_cultivation_year

if st.session_state.show_share_area_cultivation_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="share_area_cultivation_year")

@st.cache_resource
def
plot_share_area_cultivation_year(year
): crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
area_by_state =
grouped_state['Area'].sum().sort_values(ascending=False)
pie_break = [i for i in
area_by_state.head(10)] +
[area_by_state.sum() - (area_by_state.head(10).sum())]
pie_labels = [i for i in area_by_state.head(10).index] +
['other']

plt.figure(figsize=(10, 6))
plt.pie(pie_break, labels=pie_labels, autopct='%.2f%%')
plt.title(f'Share of Area under Cultivation in Year
{year}'
) st.pyplot(plt)
plot_share_area_cultivation_year(year)

# Production by State in Year


st.write("### Production by State in Year")
if st.button("Show Production by State in Year"):
st.session_state.show_production_state_year = not
st.session_state.show_production_state_year

if st.session_state.show_production_state_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_state_year")

@st.cache_resource
def
31
plot_production_state_year(year):
crop_df = pd.DataFrame(data)

32
crop_df = crop_df[crop_df.Crop != 'Coconut']
crop_df_Year = crop_df[crop_df.Year == year]
grouped_state = crop_df_Year.groupby('State')
prod_by_state =
grouped_state['Production'].sum().sort_values(ascending=False)

plt.figure(figsize=(18, 4))
plt.bar(prod_by_state.index, prod_by_state / 1e7)
plt.title(f'Production by State in Year {year}
hect)' (million
)
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_production_state_year(year)

# Production by Crop in Year


st.write("### Production by Crop in Year")
if st.button("Show Production by Crop in Year"):
st.session_state.show_production_crop_year = not
st.session_state.show_production_crop_year

if st.session_state.show_production_crop_year:
year = st.selectbox("Select Year", data['Year'].unique(),
key="production_crop_year")

@st.cache_resource
def plot_production_crop_year(year):
crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop !=
'Coconut'] crop_df_Year =
crop_df[crop_df.Year == year]
grouped_crop =
crop_df_Year.groupby('Crop')
percent_crop =
grouped_crop['Production'].sum().sort_values(ascending=False)

hect)'
33
p nt_crop.index, percent_crop)
l plt.title(f'Production by Crop in Year {year}
t (million
.
f plt.ylabel('Production (million tonnes)')
i plt.xticks(rotation=90)
g
u
r
e
(
f
i
g
s
i
z
e
=
(
1
8
,

4
)
)

p
l
t
.
b
a
r
(
p
e
r
c
e
34
st.pyplot(plt)
plot_production_crop_year(year)

# Selected State and Crop Production


st.write("### Selected State and Crop
Production")
if st.button("Show Selected State and Crop Production"):
st.session_state.show_selected_state_crop_production = not
st.session_state.show_selected_state_crop_production

if st.session_state.show_selected_state_crop_production:
year = st.selectbox("Select Year", data['Year'].unique(),
key="selected_state_crop_year")
crop = st.selectbox("Select Crop", data['Crop'].unique(),
key="selected_state_crop_crop")

@st.cache_resource
def plot_selected_state_crop_production(year,
crop): crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop !=
'Coconut'] crop_df_year =
crop_df[crop_df.Year == year]
selected_crop_df = crop_df_year[crop_df_year.Crop ==
crop] production_by_state =
selected_crop_df.groupby('State')
['Production'].sum().sort_values(asce nding=False)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index, production_by_state
1e6) /

tonnes)' plt.title(f'{crop} production by State {year} (million


)
plt.ylabel(f'{crop} production (mill tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_state_crop_production(year, crop)

# Selected Crop Production Across Top 10 States


st.write("### Selected Crop Production Across Top 10
States") if st.button("Show Selected Crop Production
35
Across Top 10
States"):

36
st.session_state.show_selected_crop_production_top_sta
tes = not
st.session_state.show_selected_crop_production_top_states

if
st.session_state.show_selected_crop_production_top_sta
tes: crop = st.selectbox("Select Crop",
data['Crop'].unique(),
key="selected_crop_production_top_states_crop")

@st.cache_resource
def
plot_selected_crop_production_top_states(crop
): crop_df = pd.DataFrame(data)
crop_df = crop_df[crop_df.Crop != 'Coconut']
selected_crop_df = crop_df[crop_df['Crop'] == crop]
production_by_state =
selected_crop_df.groupby('State')
['Production'].sum().sort_values(asce nding=False).head(10)

plt.figure(figsize=(15, 5))
plt.bar(production_by_state.index,
1e6) production_by_state /

plt.title(f'{crop} Production by State (Million


Tonnes)') plt.ylabel(f'{crop} Production (Million
Tonnes)') plt.xticks(rotation=90)
st.pyplot(plt)
plot_selected_crop_production_top_states(crop)

# Total Production of Rice & Wheat


st.write("### Total Production of Rice & Wheat")
if st.button("Show Total Production of Rice & Wheat"):
st.session_state.show_total_production_rice_wheat = not
st.session_state.show_total_production_rice_wheat

if st.session_state.show_total_production_rice_wheat:
@st.cache_resource
def plot_total_production_rice_wheat():
rw_years =
data[data.Crop.isin(['Rice',
37
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-
21'],
inplace=True)
rw_group = rw_years.groupby('Year')

38
plt.figure(figsize=(14, 8))
plt.plot(rw_group['Production'].sum() / 1e7)
plt.title('Total Production of Rice & Wheat Over the
Years'
) plt.xlabel('Year')
plt.ylabel('Production (million tonnes)')
plt.xticks(rotation=90)
st.pyplot(plt)

plot_total_production_rice_wheat()

# Total Production of Rice & Wheat


st.write("### Heat Map Average Yield by State and Year")
if st.button("Show Heat Map Average Yield by State and Year"):
st.session_state.show_heat_map_average_yield_by_state_year
=
not st.session_state.show_heat_map_average_yield_by_state_year

if st.session_state.show_heat_map_average_yield_by_state_year:
@st.cache_resource
def
plot_heat_map_average_yield_by_state_yea
r(): rw_years =
data[data.Crop.isin(['Rice',
'Wheat'])][['Year', 'Yield', 'Area', 'Production', 'State']]
rw_years.drop(rw_years.index[rw_years.Year == '2020-
21'],
inplace=True)
heatmap_df = rw_years[['State', 'Year',
'Yield']].groupby(['State', 'Year'])
['Yield'].mean().unstack(level=-1)

# Handle missing values if necessary (e.g., fill with


0 or a specific value)
heatmap_df = heatmap_df.fillna(0)

# Plot the heatmap


plt.figure(figsize=(10, 8))
sns.heatmap(heatmap_df, annot=True,
cmap='viridis') plt.title('Average Yield by State
39
and Year') st.pyplot(plt)

plot_heat_map_average_yield_by_state_year()

40
elif st.session_state.page == "Trend Analysis":
st.title("Trend Analysis")
st.write("Welcome to the Learning Data Analysis Through
Other Analysis Algorithms tab.")

# Total Crop Production in India


st.write("### Total Crop Production in India (1997-
2020)") if st.button("Total Crop Production in
India"):
st.session_state.show_total_production = not
st.session_state.show_total_production

if st.session_state.show_total_production:
@st.cache_resource
def plot_total_production():
data.drop(data.index[data.Year == '2020-21'], inplace =
True)
production_trend =
data.groupby('Year')['Production'].sum()
plt.figure(figsize=(12, 6))
plt.plot(production_trend.index,
production_trend.values,
marker='o'
) plt.title('Total Crop Production in India (1997-2020)')
plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')
plt.grid(True)
plt.xticks(rotation=90)
st.pyplot(plt)
plot_total_production()

# Future Data Prediction Linear Regression Model


st.write("## Future Data Prediction Linear Regression
Model") st.write("### Why to use a Linear Regression
Model ?") st.write("""A Linear Regression model is used in
this function to
identify
and quantify the trend in historical crop
production data. It helps in predicting
future crop production by extending the linear trend
observed in past data. The simplicity
41
and interpretability of Linear Regression make it a
suitable choice for forecasting future

42
values based on historical trends. If the data shows a
consistent linear trend, this model provides
a straightforward method for making future
projections.""")
if st.button("Show Future Data Prediction Graph"):
st.session_state.show_future_data_prediction = not
st.session_state.show_future_data_prediction

if
st.session_state.show_future_data_predic
tion: @st.cache_resource
def plot_future_data_prediction():
data.drop(data.index[data.Year == '2020-21'],
inplace=True, errors='ignore')

data['Year'] = data['Year'].apply(lambda x:
int(x.split('-
')[0]))

production_trend =
data.groupby('Year')['Production'].sum()

X = production_trend.index.values.reshape(-
1, 1) y = production_trend.values

model =
LinearRegression()
model.fit(X, y)

future_years = np.arange(X[-1] + 1, X[-1] + 6).reshape(-


1,
1)
predictions = model.predict(future_years)

plt.figure(figsize=(12, 6))
plt.plot(production_trend.index,
production_trend.values,
marker='o', label='Actual Production')

plt.plot(future_years, predictions,
marker='x', linestyle='--', color='red',
43
label='Predicted Production')

plt.title('Total Crop Production in India (1997-2025)')


plt.xlabel('Year')
plt.ylabel('Total Production (Tonnes)')

44
plt.grid(True)
plt.xticks(rotation=90)
plt.legend()
st.pyplot(plt)
plot_future_data_prediction()

elif st.session_state.page == "Correlation Analysis":


st.title("Correlation Analysis")
st.write(""" **Correlation Analysis**:
Correlation analysis helps in understanding the
relationship between different variables
related to crop production. For instance, it can
reveal how factors like rainfall, temperature,
soil pH, and fertilizer usage are correlated with crop
yield.""")

# Function to convert 'Year' from '2001-02' format to a


numerical format
def convert_year(year_str):
start_year, end_year = year_str.split('-')
start_year, end_year = int(start_year), int("20" +
end_year) return (start_year + end_year) / 2

# Create a temporary column for the numerical year


data['Temp_Year'] = data['Year'].apply(convert_year)

# Function to display correlation between two


fields def display_correlation(data, field1,
field2):
correlation = data[[field1,
field2]].corr() if
(field1=='Temp_Year'):
field1='Year'
st.write(f"### Correlation between {field1} and {field2}")
st.write(correlation)

# Correlation between Area and Production


display_correlation(data, 'Area', 'Production')

# Correlation between Area and Yield


display_correlation(data, 'Area', 'Yield')
45
# Correlation between Production and Yield
display_correlation(data, 'Production', 'Yield')

# Correlation between Year and Production using Temp_Year


display_correlation(data, 'Temp_Year', 'Production')

# Correlation between Year and Yield using Temp_Year


display_correlation(data, 'Temp_Year', 'Yield')

# Correlation between Year and Area using Temp_Year


display_correlation(data, 'Temp_Year', 'Area')

# Drop the temporary column after analysis


data.drop(columns=['Temp_Year'], inplace=True)

elif st.session_state.page == "Seasonal


Analysis": st.title("Seasonal Analysis")
st.write("""Seasonal analysis is used in projects to
identify and understand patterns that occur
at regular intervals over a specific period, such
as weeks, months, quarters, or years.
This analysis helps in forecasting, decision-making,
and strategy formulation.""")
if st.button("Show Seasonal Analysis Graph"):
st.session_state.show_seasonal_analysis = not
st.session_state.show_seasonal_analysis

if st.session_state.show_seasonal_analysis:
@st.cache_resource
def plot_seasonal_analysis():
# Boxplot of production by season
data.drop(data.index[data.Season == 'Whole Year'],
inplace
=
True) # data.Season != 'Whole Season'
plt.figure(figsize=(12, 6))
sns.boxplot(x='Season', y='Production',
data=data) plt.title('Production by Season')
plt.xlabel('Season')
plt.ylabel('Production
(Tonnes)')
46
p
l
t
.
x
t
i
c
k
s
(
r
o
t
a
t
i
o
n
=
4
5
)

47
st.pyplot(plt)

plot_seasonal_analysis

()

elif st.session_state.page == "Yield Prediction


Model": st.title("Yield Prediction Model")
st.write("""A Yield Prediction Model is essential for
optimizing resource use, financial planning,
and risk management in agriculture. It enables accurate
forecasting of crop yields, helping
farmers and businesses make informed decisions.
Calculating Mean Squared Error (MSE) is crucial
as it measures the average squared difference
between actual and predicted values, providing a
clear metric for model accuracy. Lower MSE values
indicate better model performance, guiding
improvements and comparisons between different
models.""")
if st.button("Show Yield Prediction Model Graph"):
st.session_state.show_yield_prediction_model = not
st.session_state.show_yield_prediction_model

if
st.session_state.show_yield_prediction_m
odel: @st.cache_resource
def plot_yield_prediction_model():
data = pd.read_csv('India Agriculture
Crop Production.csv')
data = data.dropna(subset=['Area', 'Production',
'Yield'])

data[['Area', 'Production', 'Yield']] = data[['Area',


'Production', 'Yield']].apply(pd.to_numeric, errors='coerce')

X = data[['Area',
'Production']] y =
data['Yield']

# Check for any remaining NaNs


48
if X.isnull().any().any() or
y.isnull().any(): st.write("Data contains
missing values.") return

49
X_train, X_test, y_train, y_test =
train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test,
y_pred) st.write(f'Mean Squared
Error: {mse}')

st.write("Actual vs Predicted Production")


comparison = pd.DataFrame({'Actual': y_test,
y_pred} 'Predicted':
)
st.line_chart(comparison)
plot_yield_prediction_model()

if name == " main


": main()

Command to run the Project

python -m streamlit run agriculture_app1.py

50
Screenshots:

51
52
53
54
55
56
57
58
59
60
61
62
5. Conclusion

In conclusion, the Indian Agriculture Crop


Production Analysis provides critical insights into the
trends, patterns, and influencing factors of crop yield
over the years. By leveraging techniques such as
correlation analysis and predictive modeling with linear
regression, we can identify key variables that
significantly impact production.
This analysis not only helps in understanding past
performance but also enables accurate forecasting of
future yields, aiding in strategic planning and decision-
making. The integration of data science and machine
learning models, such as the Mean Squared Error
evaluation, enhances the accuracy of predictions and
optimizes agricultural practices.
Ultimately, this comprehensive analysis serves as a
valuable tool for policymakers, farmers, and researchers
to improve crop management, ensure food security, and
drive sustainable agricultural growth in India.

63
6. Future Enhancement
Remote Sensing and Satellite Imagery: Utilize remote sensing
technologies and satellite imagery to monitor crop health, soil moisture,
and other critical parameters in real-time, enabling more precise and
timely interventions.

IoT Integration: Deploy Internet of Things (IoT) devices in fields


to collect real-time data on weather conditions, soil properties, and
crop health. This data can be integrated with predictive models to
enhance decision-making.

Climate Change Impact Analysis: Conduct detailed studies on the


impact of climate change on crop production. Develop adaptive
strategies and models to mitigate adverse effects and ensure resilience in
agricultural practices.

Precision Agriculture: Implement precision agriculture techniques that


use data analytics to optimize the use of inputs like water, fertilizers,
and pesticides, thereby increasing efficiency and reducing
environmental impact.

Mobile Applications for Farmers: Develop user-friendly mobile


applications that provide farmers with real-time data, predictive insights,
and recommendations based on the latest analysis, empowering them to
make informed decisions.

64
References
https://www.youtube.com/
https://www.kaggle.com/
https://docs.streamlit.io/

65

You might also like