0% found this document useful (0 votes)
4 views

Report 1

The document outlines a capstone project titled 'Crop Predicting System Using Machine Learning' aimed at assisting farmers in selecting optimal crops based on soil and environmental conditions. The project involves data collection, preprocessing, and the application of machine learning algorithms to improve agricultural productivity and economic growth. The system is designed to provide recommendations for crop cultivation, enhancing farmers' decision-making and contributing to better agricultural practices.

Uploaded by

Aditya Bharat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Report 1

The document outlines a capstone project titled 'Crop Predicting System Using Machine Learning' aimed at assisting farmers in selecting optimal crops based on soil and environmental conditions. The project involves data collection, preprocessing, and the application of machine learning algorithms to improve agricultural productivity and economic growth. The system is designed to provide recommendations for crop cultivation, enhancing farmers' decision-making and contributing to better agricultural practices.

Uploaded by

Aditya Bharat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CAPSTONE PROJECT REPORT

(Project Term Aug-May 2025)

CROP PREDICTING SYSTEM USING MACHINE LEARNING

Submitted by

Aditya Bharat Registration Number : 12110715

Nagireddy Shamitha Registration Number : 12112996

Project Group Number KC258

Course Code: CSE439

Under the Guidance of

Dewashish Kumar 32315

School of Computer Science and Engineering


DECLARATION

We hereby declare that the project work entitled “CROP PREDICTING SYSTEM USING MACHINE
LEARNING ” is an authentic record of our own work carried out as requirements of Capstone Project for
the award of B.Tech degree in Computer Science &Engineering from Lovely Professional University,
Phagwara, under the guidance of Dewashish Kumar, during January to May 2025. All the information
furnished in this capstone project report is based on our own intensive work and is genuine.

Project Group Number : KC258

Name of Student : Aditya Bharat


Registration Number:12110715

(Signature of Student )
Date: 21-02-25
CERTIFICATE

This is to certify that the declaration statement made by this group of students is correct to the best
of my knowledge and belief. They have completed this Capstone Project under my guidance and
supervision. The present work is the result of their original investigation, effort and study. No part
of the work has ever been submitted for any other degree at any University. The Capstone Project
is fit for the submission and partial fulfillment of the conditions for the award of B.Tech degree in
Computer Science & Engineering(CSE) from Lovely Professional University, Phagwara.

DEWASHISH KUMAR

ASSISTANT PROFESSOR

School of Computer Science and Engineering,


Lovely Professional University, Phagwara, Punjab.

Date :
ACKNOWLEDGEMENT

We would like to express our profound gratitude to our project guide, Mr.Dewashish Kumar, for
her unwavering support and guidance throughout the completion of this report. He provided his
extensive expertise and knowledge, which has been of immense help to us in every step of the
project's development. His insights, suggestions, and corrections have been invaluable in shaping
the report to its final form. We are grateful to have had such a dedicated mentor and advisor in our
project, and her contribution to our success is greatly appreciated. We also want to extend our
heartfelt thanks to our parents for their constant encouragement, love, and support throughout our
academic journey. Their unwavering belief in us has been a significant driving force, and their
support has been a constant source of motivation and inspiration.
Finally, we would like to declare that to the best of our knowledge and belief, the Project Work
has not been submitted anywhere else. We hope that our work will contribute to the existing
literature and be of value to the academic community. Once again, we express our sincere
gratitude to Mr. Dewashish Kumar and our parents for their support, without which this report
would not have been possible.
ABSTRACT

India is the second most populous country in the world and has a huge percentage of its population
engaged in the agricultural industry. Most of the farmers, however, have a problem deciding on the right
crops to plant in their soils. For those who wish to plant protein-rich vegetables, it is important that they
choose crops that are compatible with the specific needs of their soil and climatic conditions. To address
this, we propose a machine learning system that will assist farmers in selecting the optimal crop based on
a variety of soil and environmental conditions. Farmers can utilize such an intelligent system to improve
their margins and produce crops with improved quality and higher nutrition. As data science contributes
increasingly to agriculture, using predictive tools like these has become more important than ever. Crop
data science is a rapidly developing field with vast potential, and the crop predictions generated by our
machine learning model will allow farmers to make more informed decisions based on factors such as
temperature, rainfall, and soil type.
TABLE OF CONTENT
DESCRIPTION PAGE NUMBER

Inner first Page


PAC form
Declaration
Certificate
Acknowledgement
Table of Contents

1. INTRODUCTION 1
1.1. CROP PREDICTION 1
1.2. OBJECTIVE 2
2. LITERATURE SURVEY 3
3. EXISTING SYSTEM & PROPOSED SYSTEM 6
3.1. EXISTING SYSTEM 6
3.2. PROPOSED SYSTEM 6
4. IMPLEMENTATION 8
4.1. DATA COLLECTION 8
4.2. DATA PREPROCESSING 8
4.3. MACHINE LEARNING ALGORITHMS 9
4.4. CROP PREDICTION 10
5. METHODOLOGY 13
5.1. BASIC PROCESS 13
5.2. DATA SETS 13
5.3. EXPLORATORY DATA ANALYSIS 14
5.4. EDA PERFORMED 15
5.5. ALGORITHM USED 16
5.6. RANDOM FOREST 17
6. SYSTEM ANALYSIS AND DESIGN 19
6.1. SYSTEM ARCHITECTURE 19
6.2. FLOWCHART 20
7. SYSTEM REQUIREMENTS SPECIFICATIONS 22
7.1. BASIC REQUIREMENTS 23
7.2. REQUIREMENT 24

i
7.2.1. HARDWARE REQUIREMENTS 24
7.2.2. SOFTWARE REQUIREMENTS 26
8. RESULT AND DISCUSSION 30
8.1. RESULT 30
9. CONCLUSION & FUTURE WORK 32
9.1. CONCLUSION 32
9.2. FUTURE WORK 32

vi
INTRODUCTION

1.1. CROP PREDICTION

Agriculture has a long history in India. Agriculture is one of the significant occupation practiced in
India. India is a country of millions of villages so it engages a large percentage of population in rural
areas. It is the widest economic sector and it plays a most significant role in the overall development of
the country. Over 60% of the nation's land is utilized for farming to meet the demands of 1.4
billion individuals Therefore, the adoption of new agricultural technologies is of utmost importance.
Presentday agricultureis heavily technologydependent and concentrates on generating high returns from
chosen hybrid crops, which kill the physical and biochemical properties of the soil in the long term. This
will be makes the farmers of our nation head towards profit.
Earlier crop forecasting andcrop forecasting was made onthe farmer's experience regarding a specific
point. The crop is the predominant factor that influences in agricultural revenue.
The crop relies on numerous factors like climatic, geographical, and economic factors. For farmers,
it is challenging to determine when to plant and which crops to do so.
Farmersare not sure ofwhichcrop they should cultivate and when and where to initiate them because of

lack of certainty over climatic factors.They will like the earlier or locality or more fashion crop in
the area only for their land and they don't possess sufficient of knowledge regarding the content of soil
nutrients like nitrogen, phosphorus, potassium in the land.
Keeping in view all these issues we planned the system with a machine learning for improving the
farmer. Machine learning (ML) is a game changer for agriculture industry. Machine learning is
the subset of artificial intelligence, has emerged along with bigdata technologies and high-performance
computing to enable new opportunities in data intensive science in multi-disciplinary agro technology
field. The planned system will suggest the most appropriate crop for specific land. Based on weather
parameter and soil content like Rainfall, Temperature, Humidity and pH. They are gathered from
Government website and KAGGLE. The system accepts the input needed from the farmers or data set
like Temperature, Humidity and pH. This all inputs data is used to apply machine learning predictive
algorithms like logistic regression and Decision tree to recognize the pattern between data and then
process it according to input condition The system recommends the crop for the farmer and also
recommends the amount of nutrients to be add for the predicted crop.
1.2. OBJECTIVE

• Towards achieving the best possible crop growth, development and production.
• Predicting right crop from given rain fall and temperature and soil.
• Towards improving economic growth of all the stake-holders.
• Towards improvement of nutritional standards for health betterment.
• Towards contribution to protection and up gradation of the environment towards ensuring ecological
balance, prevention of global warming and healthy life of man and animal.
• to minimize the economic losses to the farmers due to cultivation of the wrong crop
• Also to guide the farmers to discover new varieties of crops that can be grown in their region. So, to
enrich all the farmers by farm and wealth we developing this System.
LITERATURE

Crop yield prediction plays a vital role in enhancing agricultural productivity, and over the years, multiple
approaches have been proposed to improve prediction accuracy. This study explores various machine
learning techniques to model and forecast crop yields in rural regions based on soil parameters (such as
pH, nitrogen, potassium) and environmental factors (like rainfall and humidity).
The analysis focuses on five major crops cultivated in Tamil Nadu—rice, maize, ragi, sugarcane, and
tapioca—spanning a dataset from 2005 to 2010. Parameters such as rainfall, groundwater levels,
cultivated area, and soil types were taken into consideration to optimize crop productivity. For clustering,
the K-Means algorithm was utilized, while classification involved Fuzzy logic, K-Nearest Neighbors
(KNN), and an improved variant called Modified KNN (MKNN). Among these, MKNN yielded the most
accurate predictions.
To support practical implementation, a farmer-centric application can be developed to address key
challenges in the agriculture sector. The app would allow farmers to input details like crop name, season,
and location for single or multiple tests. Based on the inputs, users can choose the desired algorithm and
receive yield predictions. Historical yield data has been cleaned and formatted into compatible datasets
for training purposes. The system employs Naïve Bayes and KNN models for analysis.
Data for model training was sourced from official government platforms, covering over a decade of
agricultural trends. Additionally, a custom IoT device was built to gather real-time environmental data
using sensors like DHT11 for humidity and temperature and soil sensors interfaced with an Arduino Uno
(powered by an Atmega processor). The Naïve Bayes model achieved 97% accuracy and was further
enhanced using boosting techniques, which combine weak learners in an iterative manner to increase
overall system performance.
Advanced regression techniques such as Elastic Net (ENet), Kernel Ridge, and Lasso were also employed
for yield estimation. These models were further improved using Stacking Regression to enhance
prediction accuracy. Comparative studies showed that the proposed system using Random Forest
outperformed existing systems that relied on Naïve Bayes. Being a bagging technique, Random Forest
demonstrated higher reliability, whereas Naïve Bayes, being probabilistic, had lower predictive accuracy.
The study supports several objectives:
 (a) Crop yield prediction using a variety of ML models with accuracy and error rate comparison specific
to geographical zones.
 (b) Development of a mobile application to recommend the most profitable crop.
 (c) Integration of GPS-based location services to accurately estimate rainfall.
 (d) Providing guidance on optimal fertilizer application timing.
Multiple machine learning algorithms such as KNN, Support Vector Machine (SVM), Multiple Linear
Regression (MLR), Random Forest, and Artificial Neural Networks (ANN) were evaluated using datasets
from Karnataka and Maharashtra. Decision Tree emerged as the most effective, showing an accuracy of
99.87% among the basic algorithms tested.
To understand the impact of external factors on crop yield, regression analysis was conducted using three
independent variables: area under cultivation, food price index, and annual rainfall. Crop yield was
considered the dependent variable. The coefficient of determination (R²) indicated that these variables had
a measurable, albeit moderate, influence on yield outcomes.
Data sources included government platforms such as the APMC website and VC Farm Mandya,
containing comprehensive details on climatic conditions and soil nutrients. For rainfall prediction, an
SVM model with a Radial Basis Function (RBF) kernel was used, while crop prediction was handled
using Decision Tree algorithms.
A comparative assessment of various machine learning methods was carried out using regression-based
models—Linear, Decision Tree, Random Forest, Gradient Boosting, Polynomial, and Ridge Regression—
applied to datasets consisting of crop types, geographical states, and seasonal conditions. Performance
was measured using evaluation metrics such as Mean Absolute Error (MAE), Root Mean Square Error
(RMSE), Mean Squared Error (MSE), R² score, and cross-validation. Gradient Boosting showed the
highest yield prediction accuracy at 87.9%, while Random Forest achieved the highest production
prediction accuracy at 98.9%. For live monitoring of temperature and humidity, the DHT22 sensor is
suggested as an optimal choice.
EXISTING SYSTEM & PROPOSED SYSTEM

3.1 Existing System

The rate of crop yield plays a vital role in boosting a nation's economy, especially in agriculture-driven
countries like India. Enhancing crop productivity has become essential to meet the growing food demand.
This challenge has traditionally been addressed through biological methods such as seed quality
improvement, crop hybridization, and the use of effective pesticides, alongside chemical approaches like
the application of fertilizers, urea, and potash.

One notable solution introduced is the Crop Selection Method (CSM), which aims to optimize crop
yield during a specific season. CSM demonstrates how thoughtful selection and sequencing of crops can
support farmers in maximizing land usage and increasing net yields.

3.2 Types of Crops (Based on Growth Periods)

Crops can be classified into four major categories based on their growing duration:

Seasonal Crops: These are cultivated only in specific seasons, e.g., wheat, cotton.

Year-Round Crops: Can be grown throughout the year, such as paddy, toor.

Short-Duration Crops: Require a brief growing period, e.g., potatoes, some leafy vegetables.

Long-Duration Crops: Take a longer time to mature, e.g., sugarcane, onion.

An optimal combination and rotation of these crops can lead to higher overall yield within a season,
promoting better land utilization and reusability through strategic planning using the CSM approach .

3.3 Farming Practices in India

India encompasses a variety of agricultural systems, including industrial, organic, and subsistence
farming. Specific regions also engage in practices like agroforestry, horticulture, and ley farming,
depending on climatic and geographical conditions .

Previous research has shown promising results by applying machine learning (ML) with single-attribute
models. Our goal is to improve these models by incorporating additional features, which can enhance
yield predictions and uncover deeper patterns related to crop suitability. This system aims to provide
intelligent crop recommendations tailored to specific regions and conditions.

3.4 Proposed System

The proposed system aims to predict the most suitable crop for a given piece of land by analyzing both
soil characteristics and climatic parameters such as temperature, humidity, pH level, and rainfall. The
system’s architecture consists of several functional components as outlined below .

3.5 Data Collection

The first step involves collecting accurate and relevant agricultural data from trusted sources like
government portals, VC Farm Mandya, and the APMC website. The dataset includes critical features such
as:
Soil pH

Temperature

Humidity

Rainfall

Crop history and types

Nutrient values (Nitrogen, Phosphorus, Potassium - NPK)

This dataset forms the foundation of the predictive model.

3.6 Data Preprocessing

After data collection, preprocessing is essential to ensure data quality. The steps include:

Reading and consolidating datasets from multiple sources

Removing duplicates and irrelevant attributes

Handling missing values by either removing incomplete records or imputing them using

Clean and well-structured data is crucial for improving model accuracy during training .

3.7 Machine Learning Algorithm for Prediction

The system leverages machine learning algorithms that are trained to produce optimized predictions
based on input data. The chosen algorithm—Decision Tree Classifier—is known for its interpretability
and high accuracy in classification problems.

3.8 Crop Recommendation

Based on real-time and forecasted parameters like rainfall, soil pH, temperature, and humidity, the
system recommends the most viable crop for cultivation. The recommendation process follows these
steps:

Input parameters are either manually entered or automatically sensed through hardware sensors.

The processed data is then passed through the trained Decision Tree model.

The algorithm uses these values to determine and suggest the most suitable crop.

This recommendation model can be deployed as a smart application to support farmers in making
informed decisions, improving crop planning, and ultimately enhancing yield efficiency.
Implementation

The system will predict the most suitable crop for particular land based on soil parameters and
weather parameters such as Temperature, Humidity, soil PH, Potassium(K), Phosphorus(P),
Nitrogen(N) and Rainfall.

Fig.4.1: Architecture of crop prediction

4.1. Data Collection:


Data collection or Loading Data is the most effective way for gathering and measure the data from
various resources such as govt. websites, Kaggle website etc. To obtain an approximate dataset for the
system.
This dataset should have the following attributes i)Soil Ph ii) Temperature iii) Humidity
iv)Rainfall v) Crop data vi) NPK values, those parameters will consider for crop prediction.

4.2. Data Preprocessing: -


After gathering datasets from diverse sources. Dataset should be preprocessed prior to training to the
model. Preprocessing of data can be achieved by different stages, starts with reading the gathered
dataset the process is still ongoing to data cleaning. During data cleaning the datasets have some
redundant attributes, those attributes are not taking into account for crop prediction. Therefore, we need
to drop unwanted attributes and datasets having some missing values we need to drop these missing
values or fill with unwanted nan values in order to achieve better accuracy. Then set the target for a
model. After data cleaning the dataset will be divided into training.
4.3. Machine Learning Algorithm for Prediction:
Machine learning predictive algorithms highly optimized estimation must be probable outcome based on
data training. Predictive analytics refers to using data, statistical algorithms, and machine learning
algorithms to determine the probability of future outcomes based on past data. The aim is to move
beyond knowing what has occurred to giving a best guess about what will occur in the future. First, In
our system we verify the best accuracy between some algorithms like Logistic Regression, Decision
Tree and Random Forest.
4.3.1 Logistic Regression:
Logistic regression is the most common Machine Learning algorithm, which is included in the
Supervised Learning technique. Logistic regression is used to predict the categorical dependent
variable on the basis of a given set of independent variables. Logistic regression predicts the result of a
categorical dependent variable.

4.1.1 Decision Tree:

It is a supervised learning algorithm in which attributes and class labels are defined through a
tree. In this, root attributes are matched with the record's attribute and then, based on the match,
a new node is accessed. This matching is repeated until a leaf node with a predicted class value
is accessed. Thus, a modeled decisitree is highly efficient for prediction purposes.

Fig. 4.2: Structure of decision tree


4.1.2 Random Forest:
It is a supervised learning algorithm where attributes and class labels are specified via a tree. Here, root
attributes are compared with the attribute of the record and then, depending upon the comparison, a new
node is visited. This comparison is continued until a leaf node with an estimated class value is visited. A
modeled decision tree is therefore very efficient for prediction.

Fig.4.3: Working Of Random Forest

We will discuss about accuracy levels of the above three algorithms while Result section.

4.2. Crop Prediction:

It is a bagging method of learning that finds extensive use for both classification and regression. In
order to train the model in making prediction with this algorithm, the test features must be passed
through the rules of each randomly created tree. Because of this, a new target will be predicted by each
random forest for the same test feature. After that, votes are calculated on the basis of each predicted
target. The last prediction of the algorithm is the most voted predicted target. Random forest
algorithm's capability to handle missing values effectively and the classifier never over-fitting the
model are huge benefits in applying this algorithm.
Fig.4.7: Created Web Page

METHODOLOGY
5.1 Basic Process

i. Data Collection : Collect the data the algorithm is going to learn from.

ii. Preparation of Data: Organize and preprocess the data into the optimal form, specifying the primary
features and dimensionality reduction.

iii. Training: Also known as the fitting stage, this is where the Machine Learning
algorithm is taught by giving it the collected and prepared data.

iv. Evaluation: Test the model to see how well it performs.

v. Tuning: Fine tune the model to get optimal performance


Fig.5.1: General Process

5.2 Datasets

Machine Learning is highly dependent upon data. It's the most important aspect that makes training
possible in an algorithm. It takes previous data and information and learns through experience.
The better the quality of the set of the dataset, the better the accuracy will be.
Data Collection is the first step. We need two datasets for this project. One dataset to train the crop
prediction algorithm and another to predict the weather.i.e. Average Rainfall and Average Temperature.
These two parameters are predicted in order to use them as inputs for predicting the suitable crop.
The information used by the crop prediction module should have the following columns: State,
District, Crop, Season,
Average Temperature, Average Rainfall, Soil Type, Area and Production because these are
the key factors upon which crops depend. 'Production' is class variable or dependent variable. Eight
independent variables and 1 dependent variable exist. We have accomplished this by joining the
datasets. The datasets were joined with location as the common attribute in both.
Fig.5.2: Soil and Crop data sample

5.1 Exploratory Data Analysis

It is a technique of examining datasets to encapsulate their essence, often with the help of
visualization. It is also about getting to know your data, achieving some comfort level with the data,
before one starts extracting insights from the data. The philosophy is less coding time and more time
on data analysis itself. Once data is gathered, a little processing is performed prior to cleaning of data
and EDA. Post EDA, come back to processing and cleaning of data, i.e., it can be an iterative process.
Then employ cleaned dataset and knowledge gained using EDA to perform modelling and reporting.
Exploratory data analysis is generally cross-classified on two grounds. First, each technique is
graphical or non-graphical. And second, all techniques are univariate or multivariate (usually just
bivariate). It makes sense to know the data beforehand and try to find out as much as possible from it.
EDA is all about interpreting available data. EDA can give us the following:

• Preview data
• Check total number of entries and column types by employing in-built functions. Being aware
of Columns and their respective data types is a good habit.
• Check for any null values.
• Check duplicate entries
• Plot numeric data distribution (univariate and pairwise joint distribution)
• Plot count distribution of categorical data.

With the assistance of various built-in functions, one can find out the number of values in each
column that will tell us about the null values or the duplicate data. One can also find out the
mean, standard deviation, minimum value and maximum value. This is the basic process of EDA.
For a better understanding of the data we are working with, we can create graphs like such as the
correlation matrix that is one among the most important concept which gives us so much
information about how the variables (columns) are related to each other and how much effect
each of them have on another.
5.2 Algorithms Used

Machine Learning Algorithms Overview


Machine learning provides a wide variety of algorithms that can be broadly grouped into four categories:
classification, regression, clustering, and association. These algorithms are primarily classified into two
types of learning—supervised learning and unsupervised learning.

 Classification: In classification tasks, the goal is to predict a categorical label or class such as "yes" or
"no", "disease" or "no disease", or "red" vs "blue". A commonly used algorithm for classification is the
Decision Tree.

 Regression: Regression is concerned with predicting a continuous numerical value, like predicting
weight, temperature, or price. Linear Regression is one of the foundational algorithms used for such
problems.

 Clustering: This unsupervised learning technique is used to group data points based on similarity without
predefined labels. An example is segmenting customers by purchasing patterns using K-Means
Clustering.

 Association: Association rule mining is used to uncover interesting relationships or patterns among
variables in large datasets. For instance, discovering that customers who buy item X often also buy item
Y.

5.3 Random Forest

Random Forest is a highly effective and widely used machine learning algorithm known for its accuracy
and adaptability. It functions well even without intensive hyperparameter tuning and is capable of
handling both classification and regression tasks efficiently.
This algorithm builds a "forest" by generating multiple decision trees during the training phase. It then
combines the predictions of each individual tree to improve overall model accuracy and stability. One of
its key advantages is its ability to automatically measure the significance of input features during
training, providing insights into which factors most influence the outcome.

Working Steps of Random Forest:


1. Sampling: Begin by creating multiple random samples from the original dataset (bootstrapping).
2. Tree Construction: A separate decision tree is built for each sample using a random subset of features.
3. Prediction Aggregation: Each tree makes its own prediction.
4. Final Output: For classification, the result with the most votes is selected; for regression, the average of
all tree outputs is used.
This ensemble-based technique reduces the risk of overfitting that a single decision tree might face and
increases the model's robustness and reliability across varied datasets.
Fig.5.6 : Random Forest Flow

5.2.1 Justification for Using Random Forest

To validate the effectiveness of the Random Forest algorithm, data related to specific crops was
processed using both Random Forest and another algorithm identified as optimal for each crop type in
existing studies. The accuracy of the predictions made by both methods was compared. The selected
crops for evaluation were rice and groundnut, based on previous research findings.

 Rice: According to earlier studies, Linear Regression was considered the best fit for predicting rice
yields. However, when we implemented both Linear Regression and Random Forest on our dataset, we
observed significant discrepancies between the predicted and actual yield values in the Linear
Regression model. In contrast, Random Forest consistently produced results with over 90% accuracy,
indicating greater reliability.

 Groundnut: For groundnut yield prediction, research indicated that K-Nearest Neighbors (KNN) was
the most appropriate algorithm. Upon testing, the outcomes from KNN and Random Forest were quite
comparable, with minimal variance in prediction accuracy. This comparison suggests that Random
Forest can serve as a versatile and dependable default model for crop yield prediction, even when it is
not the algorithm traditionally favored for a specific crop.

System Analysis and Design

System Development follows a structured life cycle comprising several key phases: planning,
analysis, design, implementation, and maintenance. Each stage is crucial to building a reliable and
effective system.

 System Analysis: This is the process of understanding and evaluating the existing system or
identifying requirements for a new one. It involves gathering relevant data, identifying problems, and
breaking the system into manageable parts to determine what the system needs to accomplish. The
goal of analysis is to define what the system should do and ensure each component is aligned with the
overall objective. It plays a critical role in ensuring that system requirements are clearly understood
and documented.
 System Design : Once analysis defines the system's needs, the design phase focuses on how to meet
those needs. This involves creating detailed architecture and modules that address functional and
performance requirements. Whether designing a new system or upgrading an existing one, the goal is to
optimize computer resources and improve operational efficiency. System design ensures that the
implementation will fulfill user expectations, maintain scalability, and operate seamlessly within the
given constraints.

6.1 System Architecture

Fig.6.1: System Architecture.


6.2 Flowchart

A flowchart is a visual tool that represents a sequence of steps in a process. It illustrates the order of
actions using various types of boxes connected by arrows, making it easy to follow the workflow or
logic of a system. Originally developed in the field of computer science to outline algorithms and
programming logic, flowcharts have since found applications in many other disciplines. Today, they are
crucial for conveying information clearly and supporting logical thinking. Flowcharts simplify the
understanding of complex procedures and help structure tasks or problems effectively. They are also
valuable for planning and defining processes or projects before implementation.

SYSTEM REQUIREMENTS SPECIFICATIONS

A Software Requirements Specification (SRS) is a foundational document that clearly defines the
software system to be developed. It encompasses both functional and non-functional requirements,
serving as a crucial reference throughout the development lifecycle. This document also includes use
case scenarios that help illustrate how users will interact with the application, providing clarity for both
developers and stakeholders.
Defining requirements in the early stages of the project offers a strategic advantage. It helps development
teams understand exactly what needs to be built, minimizing ambiguity and reducing the risk of rework.
Moreover, the SRS helps identify possible constraints or risks beforehand, enabling proactive planning
and better resource management.
An SRS offers a detailed and structured overview of what the software is expected to achieve. It not
only defines the core functionality (i.e., what the system should do) but also sets expectations for
performance, usability, reliability, and other quality attributes (i.e., how well the system should perform).
By including practical examples and usage scenarios, the document aligns all stakeholders—developers,
testers, designers, and clients—on shared goals and expectations.

Gathering accurate and complete requirements demands ongoing collaboration with end-users or
clients to avoid oversights and ensure all critical aspects are captured. An SRS should also outline how
the software interacts with other modules, hardware, third-party tools, and end-users under various
conditions. This clarity is especially important for quality assurance teams, as it directly influences the
accuracy of test case design and the reliability of the testing process.

Key Characteristics of an Effective SRS:

 Accurate
 Clear and Unambiguous
 Thorough and Complete
 Logically Consistent
 Prioritized by importance and stability
 Testable/Verifiable
 Easy to Modify
 Traceable to requirements and design

RESULT &DISCUSSION

8.1 Result
In result, firstly we find out the most accuracy algorithm among Logistic Regression, Decision
Tree and Random Forest.
Table 8.1 will show the accuracy levels of the mentioned
algorithms.

Table 8.1: Table OF Accuracy Levels

Algorithm Accuracy

1. Logistic Regression 93.63%

2. Decision Tree 95.45%

3. Random Forest 97.87%


Fig.8.1: Input Screen

The system will predict the output by using Random Forest Algorithm and it display as shown
below. The inputs to be given are Temperature, Humidity, PH, Phosphorus, Nitrogen,
Photasium , Rainfall and Water Level as shown in fig. 8.1.

Fig.8.2: Output Screen

Hence, the system return the most suitable crop as shown in the fig. 8.2.
CONCLUSION & FUTUREWORK
9.1 Conclusion

Currently, our farmers are not utilizing technology and analysis effectively, and thus there can be a
possibility of wrong crop selection for cultivation which will lower their income. To minimize such type
of loses we have designed a farmer friendly system with GUI, which will forecast which would be the
best suitable crop for specific land and this system will also give information regarding needed nutrients
to include up, needed seeds for planting, yield amount and market price. Therefore, this will encourage
the farmers to take correct decision in choosing the crop for cultivation such that agricultural sector will
be improved by innovative concept.

9.2 Future Work

The future work will be to update the datasets periodically to make accurate predictions, and the
processes can be automated. We have to gather all the necessary data by providing GPS locations of a
land and by taking permission from Rain forecasting system of or by the government, we can forecast
crops by simply providing GPS location. Also, we can train the model not to over and under crisis of
the food. Also, we would be creating an app where the farmers could utilize it as app and translat. The
entire system in their local language.

You might also like