0% found this document useful (0 votes)
16 views9 pages

Soil Nutrient Analysis (1)

The document discusses the importance of nutrient-based soil analysis and predictive crop modeling in enhancing agricultural productivity in India, which is crucial for the economy. It emphasizes the use of machine learning algorithms, such as K-Nearest Neighbors, Support Vector Classification, and Gradient Boosting, to analyze soil properties and recommend suitable crops. The research aims to improve decision-making for farmers by leveraging technology and data analytics to address challenges like food scarcity and climate change.

Uploaded by

Prince Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

Soil Nutrient Analysis (1)

The document discusses the importance of nutrient-based soil analysis and predictive crop modeling in enhancing agricultural productivity in India, which is crucial for the economy. It emphasizes the use of machine learning algorithms, such as K-Nearest Neighbors, Support Vector Classification, and Gradient Boosting, to analyze soil properties and recommend suitable crops. The research aims to improve decision-making for farmers by leveraging technology and data analytics to address challenges like food scarcity and climate change.

Uploaded by

Prince Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Nutrient-Based Soil Analysis and Predictive Crop

Modeling
Raji Gupta Shiv Prakash Singh Rishabh Gupta
dept.of Information dept.of Information Technology dept.of Information Technology
Technology Ajay Kumar Ajay Kumar Garg Ajay Kumar Garg
Garg Engineering Engineering College Engineering College
College
𝑠𝑝. 𝑠ℎ𝑖𝑣. 2202@𝑔𝑚𝑎𝑖𝑙. 𝑔𝑢𝑝𝑡𝑎. 𝑟𝑖𝑠ℎ2501@𝑔𝑚𝑎𝑖𝑙.
Ghaziabad, India Ghaziabad, India

𝑟𝑎𝑗𝑖𝑔𝑢𝑝𝑡𝑎1003@𝑔𝑚𝑎𝑖𝑙.
Ghaziabad, India
𝑐𝑜𝑚 𝑐𝑜𝑚
𝑐𝑜𝑚
Shikha Agarwal
Saurabh dept.of Information Technology
dept.of Information Ajay Kumar Garg
Technology Ajay Kumar Engineering College
Garg Engineering
𝑠ℎ𝑖𝑘ℎ𝑎𝑎𝑔𝑙03@𝑔𝑚𝑎𝑖𝑙.
Ghaziabad, India
College
𝑐𝑜𝑚
𝑠𝑎𝑢𝑟𝑎𝑏ℎ2113076@𝑔𝑚𝑎𝑖𝑙.
Ghaziabad, India

𝑐𝑜𝑚

Abstract— Agriculture is a crucial part of India's economy, becomes imperative. Use of modern technical and logistic
contributing over 17% to the nation's GDP(Gross Domestic approach with an intention to cater the shortcomings is the goal
Product), engaging over 60% of the workforce but still the to be achieved. Considering its vital role in the life of masses,
effort-to-yield ratio of the sector remains significantly low. in the
Analyzing soil properties, such as temperature, pH, humidity,
water retention capacity, etc., helps determine soil quality,
which is crucial for optimal agricultural productivity. This
assessment later recommended crops that align with soil
conditions and external environments, maximizing yield
potential, and sustainable land management thereby
preventing degradation.
Similar to most studies aimed at advancing agriculture, the
main objective of this research is to drive development by
implementing ML-based models for crop classification. A
model trained using various algorithms like K-Nearest
Neighbors (KNN) algorithm, Support Vector Classification
(SVC), and more can accurately analyze soil characteristics.
By leveraging the strengths of each algorithm, such as
Gradient Boosting ability to reduce bias and Random Forest's
robustness, this ensemble model improves the prediction of
suitable crops.

Keywords— prediction, crop, nutrient, soil analysis,


modeling, agriculture

I. INTRODUCTION

In a nation where agriculture is regarded as the main


occupation for bread-winning and a major contributor to the
nation’s GDP, technological advancement in the field
broader framework economy and also at the level of
individuals, it becomes duty to ensure that right decisions are
taken by the cultivators or the farmers regarding the crop
choice, soil choice and techniques involved.

Recently, the world is facing some critical global challenges


,i.e.,food scarcity and hunger issues, even in the growing
countries due to climate changes and population ,
necessitating the promotion of intelligent agriculture. This
approach utilizes technology, data analytics, and sustainable
practices to enhance productivity and resource management.
By employing precise analysis algorithms and working on
results , farmers can improve crop yields and reduce waste.
Advancing smart agriculture is essential for safeguarding
food security and supporting sustainable development,
ultimately contributing to a more efficient agricultural system
that meets the needs of a growing population.

Switching to more technical and logic-based agricultural


practices to address the current challenges bears great
potential to improve crop yield. Agriculture is influenced by
a multitude of factors but majorly depends on environmental
factors like soil quality, its pH, rainfall in the season and
temperature,it becomes almost impossible to regulate
manually. Hence, there exist no better ways than
implementing machine learning and data analytics to analyze
and predict these variables. Machine learning models identify
patterns and correlations within the parameters , enabling us
to gain predictive analysis over decision making.

Machine learning derives from various branches and fields,


including statics, computer science and artificial
intelligence.It incorporates algorithms that allow systems
to gain insights
from data, identify patterns and make reliable predictions for enhancements in both productivity and pricing estimates.later,
the same. It produces more well-rounded conclusions by the study plans to develop an automated price
taking into account various factors, such as soil pH, moisture recommendation system using genetic problems. In [7], Aruna
levels, surrounding humidity,etc and normalizes them to a et al presents a model that inputs weather parameters to
single parameter which becomes the base for further determine suitable crops. The research employs Random
processing. This single parameter shapes the further Forest algorithm for classification and prediction, comparing
algorithms to produce more efficient and reliable resources. its performance with Support Vector Machine (SVM)
By successful utilization of the techniques, we seek to algorithm, which achieves an average accuracy of 90%. In [8],
improve decision-making and foster sustainable agricultural Yasasawy et al proposes a model using gradient Boosting to
practices. predict crop yield rates in agriculture, comparing its
effectiveness to Random Forest algorithm. The study narrows
The structure of this research paper is organized such that down to reveal that the accuracy achieved with Gradient
Section 2 outlines the methodologies, while Section 3 details Boosting is 96.7%, significantly higher than the 86.4%
the algorithms employed in the machine learning models for accuracy of Random Forest,indicating Gradient Boosting a
prediction. Section 4 includes a comparative analysis of these superior approach. Rakesh et al [9] puts forward a crop
models, enhanced by diagrams to improve clarity. Finally, the recommendation system utilizing the K-Nearest Neighbours
results are presented in Section 5. (KNN) algorithms to enhance precision agriculture.
Discussing the limitations of the existing model, the KNN
II. LITERARY REVIEW model system achieves 96% accuracy based on crop-relating
factors like soil type, climate, temperature of the area (here,
There have been recent research studies that highlight the Tamil Nadu).
implementation of machine learning and artificial intelligence
to improve crop quality. Studies have helped in placing the
III. METHODOLOGY
agriculture sector into a better position both in terms of
individual as well as nation. In the proposed paper, we
A. Gather and Extraction of Data
compare various algorithms of the model to substantiate the
For this research, we used an open-sourced dataset that
efficient algorithm for improved decision-making.
contained a multitude of factors affecting soil analysis. The
dataset consisted of detailed information on the chemical
In [1],Vidhya et al proposes a multi-model approach
properties of the sample, such as nutrients like Nitrogen,
combining Random Forest and Support Vector Machine
Potassium, Phosphorus, etc. along with environmental factors
(SVM) algorithms to analyze soil nutrients and assess the best
such as rainfall, temperature, pH of sample. The dataset also
crop. In their model, various degrees of mineral
comprises labels pointing out various crops that we will be
characterization, ranging from levels 3 to 13, are utilized to
working on, such as potato, papaya, watermelon, ragi,
obtain more accurate nutritional information. Anush at el [2]
chickpea, etc. The pre-existing dataset is a collection of
reviews the application of machine learning algorithms in
approximately 2200 rows of data, rich enough to build and
agriculture , focussing on crop recommendation and yield
train models.
prediction. It correlates soil properties such as type, nutrient
and others to influence growth and optimize process. In [3], A
Mondal et al advocates a deep learning based model for crop S. Features S. Features
prediction using a feed forward network with both backward No. No.
and forward propagation techniques. The model considers key
1 Nitrogen(N) 5 pH
elements including nitrogen, phosphorus, potassium, along
with environmental factors. Nishant et al [4] forecasts yield 2 Phosphorus(P) 6 Rainfall
using simpler parameters than others like state, district, city ,
season , allowing users to analyze the soil based on region 3 Potassium(K) 7 Temperature
rather than particular soil qualities. The study implements 4 Label/Crop
advanced regression techniques, including Kernel Ridge and
Lasso to enhance prediction accuracy. In [5], Gawd et al
B. Data Preprocessing
proposed the application of machine learning techniques to
Before training the model, the data needs to undergo
foster better decision making about harvesting.The study aims
several preprocessing steps. Initially, the missing and
to confront climate changes ,negatively impacting the yield.
redundant data is handled by filling the absentees based on
The research focuses on educating the cultivators to utilize
mean, median, or other techniques. Next, the dataset is
technology for everyday agricultural purposes.
normalized to ensure that no single factor disproportionates
Thapaswamani et al [6] addresses the use of models, such as
the output single handedly. Different factors with different
decision tree and neuro-evolutionary algorithms, to cultivate
ranges are either tuned up or tuned down to lie within a any line but a line that creates the largest gap
common data range, usually between 0 and 1. Lastly, outliers, between the two.

𝑤 .𝑓 + 𝑤 .𝑓 + . . . +𝑤 .𝑓 + 𝑏 = 0
if exists, are removed to avoid overfitting the model.

1 2 2 6 6

where 𝑤 ,𝑤 𝑎𝑟𝑒 weights of parameters they


C. Model Training
After major preprocessing is done, the entire dataset
1 2
hold.
𝑓 , are values of features in the data point/ vector
𝑓
is split into training and testing sets, ensuring there exist

1 2
separated values to train and test the data. The ratio of train to
test dataset is usually 80:20, allowing to use approximately and b is bias attached.
80% of the dataset for training purposes and achieve ● Once such an optimal line is found, the new sample
precision-based data models. This ensures that the model vector features are plugged into the equation to check
performs efficiently on unseen data and allows testing its which side it falls. Very similar to,
efficiency at fullest. Later, cross-validation techniques are if the equation is greater than 0, it is Crop A.If the
used to avoid overfitting of the model. equation is less than 0, it is Crop B.

D. Models Used Model 3 :Decision Tree Algorithm


The decision tree algorithm is based on the principle of
Model 1 : KNN Algorithm recursive partitioning, i.e, splitting data into subset based on
The KNN algorithm, called K-Nearest Neighbour, works on some decisive feature value. The data is recursively splitted
the principle of “feature proximity”, which closely involves until distinct or pure subsets are achieved forming a tree-like
calculating Euclidean distance between the data point (from structure as classification continues.
testing dataset) and chosen neighbor to determine the most Initially, the algorithm evaluates all possible split points of the
similar or relevant neighbors, based on which the prediction dataset based on the features and selects the best possible with
or classification is made. the dataset. This is done by calculating two measures:
● Gini Impurity: This impurity value checks how
Euclidean distance as d(𝑥,𝑥 ) where i = 1,2,3…n, between
Firstly, to find the ’neighbor’ of a point, we calculate the
“mixed” the dataset is, i.e, lower the gini value purer
2
𝑖
the the groups formed.
Gini=1−∑

(𝑝 )
current point and all the other points in the dataset.

𝑖
where 𝑝 is the proportion of items in a group
These distances are arranged in ascending order
.
𝑖

2
𝑑 =
(𝑥2 − 𝑥 ) + 2(𝑦 −
belonging to class i.
2 2

𝑦 ) selects the k closest points where k is


1 1
● Entropy: It is a measure of degree of randomness in
Then the algorithm the dataset. It determines how mixed or impure the
declared beforehand. Based on these k selected neighbors, the groups are. Higher the entropy value greater the
algorithm reviews the label of selected uncertainty in the dataset.
Entropy = Σ𝑝 . 𝑙𝑜𝑔 (𝑝 )
k - neighbors and chooses the crop whose label appears the 2

𝑖 2 𝑖

where 𝑝 is the proportion of items in class i .


𝑖
maximum amongst the k selected ones.

Model 2 : SVC Algorithm later the outcome of prediction.


The Support Vector Classifier(SVC) algorithm operates on the ● The vector is made to point in the n- dimensional
idea of maximizing the separation between different classes plane and we are made to find a hyperplane that
by identifying an optimal hyperplane that effectively divides separates the support vectors . SVC does not draw
the data points ensuring the largest possible margin in
between.the closest points (called support vectors) of each
class.
Following steps are taken to implement SVC algorithm in
crop recommendation system:
● A vector is used to denote the features, here as a
collection of variables such as N, P, K, pH, etc. Each
of these vectors points to a specific label , here
chickpea, apple, ragi, watermelon ,etc that will be
Summation from i= 1 to c, where c in the total number of
classes.

This splitting is continuously done on the basis of features


such as Nitrogen levels (N), pH, moisture etc. until a
stopping condition is met, i.e., maximum depth or minimal
purity. The model is trained over the training dataset and
later can be used for predicting crops according to various
features encountered during the process.

Model 4: Gradient Boosting


The Gradient Boosting Model is an enhanced model that
combines the predictions of multiple simpler models (usually
decision trees) to build a stronger model capable of accurate
and efficient predictions. The model looks for errors in the
previous results and tries to rectify them by designing a
totally new model free from previous errors.
To incorporate the Gradient Boosting algorithm into a crop Bootstrapping makes the model overfitting and more
prediction system, the following steps needs to be taken: fault tolerant, therefore, making a stronger model.
● A basic model needs to be designed to start the ● Next, feature selection is done at each node of the
implementation. Such a model can predict the most tree where instead of using all the features to make a
frequent crop, or average of the yield, etc. This is the decision, a random subset is chosen for each node.
baseline model to start the gradient boosting This random selection ensures each tree is unique
algorithm. and more prone to fitting the training dataset.
● After each prediction, the model calculates the error ● For final prediction and testing, different approach is
or difference between the actual answer and considered for regression and classification:
predicted answer. This is done to identify the areas of 1. Classification: In classification where
error and design a better model covering up those each tree in the forest makes the prediction is a
errors. This process continues to decrease the category, here rice, chickpea, ragi,etc., the mode of

𝐸 = 𝑦 − 𝐹
difference iteratively. multiple prediction is selected.The label produced

(𝑥 )
maximum times as the output is chosen as the final
𝑖 𝑖 𝑜 𝑖
𝑡ℎ
where 𝐸 = error residue at trees are trained over a random sample of the data.
𝑖
stage
𝑦 = true value (here,

�label crop)
𝑖
𝐹 (𝑥 ) = initial predicted value
𝑜 𝑖

● A new model (called the weak model ) is built that


corrects the existing errors. It is trained to fit the
trained dataset with less errors. Each iteration adds a
weak model that focuses on new errors leading to a
robust model dealing with all kinds of errors
encountered.
● After repeated training and testing, a final model
found is the combination of the models generated so
far. The model’s ability to predict crops is refined
with each iteration, as it progressively corrects the
mistakes encountered in the prediction.

Model 5 : Random Forest


The main principle of the Random Forest Algorithm is to
combine the prediction of numerous decision trees to improve
recommendation and reduce overfitting, where each iterative
model is independent of the previous ones. It is loosely based
on the idea of “ensemble learning” where multiple
independent models are combined together to build a stronger
and effective resulting model.
The major difference between the Gradient Boosting
algorithm and the Random Forest algorithm is that the tree in
Gradient Boosting is built sequentially, i.e, one after the other
whereas the trees in Random Forest are built independent of
each other.
Following steps needs to be taken to implement random forest
in crop prediction system:

● Each feature of the soil analysis, i.e N, P, K, pH,


Temp, etc., are represented as a vector and each
corresponds to a label or crop, i.e, watermelon, ragi,
chickpea,etc.
● The algorithm then creates an independent decision
tree, each trained on different features. Each of these
prediction.
2. Regression: In regression, each tree in
the forest makes a numerical prediction rather than a
category, like, expected crop yield or expected crop
production change, the average of all predictions is
considered. This is done to ensure that final
prediction is equally distributed towards each of the
outputs produced.

IV. RESULT

The following confusion matrices as shown in Figure 1,


Figure 2, Figure 3, Figure 4, Figure 5 are the outcomes of
KNN (K-Nearest Neighbor), SVC (Support Vector
Classifier), Decision Tree, Random Forest and Gradient
Boosting algorithm respectively. The x-axis of the confusion
matrix represents predicted labels of crop and the y-axis of
the confusion matrix represents the true labels. The leading
diagonal values are true positives and true negatives, i.e, the
number of instances predicted true and false respectively
whereas values under the diagonal are miss-classified data
arising due to false positives and false negatives.

Fig 1. Confusion matrix for KNN


gives us accuracy of 97.05% where as the linear SVC
algorithm made accuracy of 96.36%. The decision tree and
gradient boosting created intense competition, reaching
accuracy rates of 98.64% and 98.33%, respectively. The best
results were achieved by the random forest algorithm by
predicting 99.24% accurate results. As already stated, the
random forest algorithm is widely regarded as an effective
algorithm due to its ability to combine the interpretability of
Decision Trees with the robustness of ensemble learning,
effectively mitigating overfitting through bagging. Unlike
KNN and SVC, it is well-suited for high-dimensional data and
Fig 2. Confusion matrix for SVC large datasets, offering scalability and efficiency.
Additionally, compared to Gradient Boosting, it is less
dependent on meticulous hyperparameter tuning and exhibits
lower computational complexity, making it a practical choice
for diverse applications.

V. CONCLUSION

This paper aims to create awareness in the agriculture sector


regarding the food yield crisis and lead them to a technical
approach to deal with it. We aspire to contribute towards
smart agriculture by using machine learning algorithms.
Working on a large dataset with more than 20 labels and 2000
Fig 3. Confusion matrix for Decision Tree
relations made it more of a real life problem to be catered.
Factors affecting the healthy yield rate were considered and
prioritized. Comparing the results of different machine
learning algorithms helps analyze the efficiency and accuracy
of each algorithm in detail and select the best. Thus, this
comparison model can be effective for farmers to understand
and implement the technology for maximizing the quantity of
crop production keeping quality to its best.
Maintaining healthy crop production has always been
a great challenge to the farmers but through this research
paper, we aim to fill the gap between the actual farming
practices to the technologically backed farming practices. The
Fig 4. Confusion matrix for Gradient Boosting
practices and models proposed in the paper targets to improve
the situation of farmers and boost the economy in the longer
run. The accuracy and precision of the predicted results have
been improved from the last paper. Expanding its scope in
future, the model can be equipped with Iot sensors to collect
real-time data and make predictions. Also, other major factors
that influence the crop production can be added in the model
training to produce more accurate results. Lastly, suggestions
from local farmers, suitable crops for mixed-farming,rotation
cycle of crop,etc can also be considered while recommending
crops.

Fig 5. Confusion matrix for Random Forest VI. REFERENCES

[1] A. Mondal and S. Banerjee, "Effective Crop Prediction


Considering the label of crops such as apple, banana, Using Deep Learning," 2021 International Conference on
chickpea, moth beans, mung beans,ragi,etc as our x-feature Smart Generation Computing, Communication and
and environmental and soil composition such as pH, rainfall, Networking (SMART GENCON), Pune, India, 2021
temperature, moisture, etc as y-feature , our KNN algorithms
[2] S. S. B, Anusha, A. Shetty, R. R. Shetty, B. A. D. Alva and
A. D. Shetty, "Machine Learning Techniques in Crop [7] G. Thapaswini and M. Gunasekaran, "A Methodology for
Recommendation based on Soil and Crop Yield Prediction Crop Price Prediction Using Machine Learning," 2022 IEEE
System – Review," 2022 International Conference on 2nd International Conference on Mobile Networks and
Artificial Intelligence and Data Engineering (AIDE), Karkala, Wireless Communications (ICMNWC), Tumkur, Karnataka,
India, 2022, pp. 230-235, doi: India, 2022
10.1109/AIDE57180.2022.10078849.
[8] M. Aruna Devi, D. Suresh, D. Jeyakumar, D. Swamydoss
[3] S. Vidhya., R. Rajalakshmi., M. Mahalingam Muthuram., and M. Lilly Florence, "Agriculture Crop Selection and Yield
C. Mukesh. and K. Mohit., "Soil Nutrient Analysis for the Prediction using Machine Learning Algorithms," 2022 Second
Cultivation of Plants using Machine Learning Algorithms," International Conference on Artificial Intelligence and Smart
2023 7th International Conference on Intelligent Computing Energy (ICAIS), Coimbatore, India, 2022,
and Control Systems (ICICCS), Madurai, India, 2023, pp.
485-489, doi: 10.1109/ICICCS56967.2023.10142689. [9] M. K. Yasaswy, T. Manimegalai and J. Somasundaram,
"Crop Yield Prediction in Agriculture Using Gradient
[4] P. S. Nishant, P. Sai Venkat, B. L. Avinash and B. Jabber, Boosting Algorithm Compared with Random Forest," 2022
"Crop Yield Prediction based on Indian Agriculture using International Conference on Cyber Resilience (ICCR), Dubai,
Machine Learning," 2020 International Conference for United Arab Emirates, 2022
Emerging Technology (INCET), Belgaum, India, 2020,
[10] R. Kumar, M. Gupta and U. Singh, "Precision
[5] P. Malik, S. Sengupta and J. S. Jadon, "Comparati Agriculture Crop Recommendation System Using KNN
ve Analysis of Soil Properties to Predict Fertility and Crop Algorithm," 2023 International Conference on IoT,
Yield using Machine Learning Algorithms," 2021 11th Communication and Automation Technology (ICICAT),
International Conference on Cloud Computing, Data Science Gorakhpur, India, 2023
& Engineering (Confluence), Noida, India, 2021, pp.
1004-1007, doi: 10.1109/Confluence51648.2021.9377147.

[6] R. B. R. Gowd, S. A. N, S. N, N. L, K. Ezhilarasan and S.


S. Varun, "A Novel Based Crop Prediction using Machine
Learning and Internet of Things," 2023 International
Conference on Smart Systems for applications in Electrical
Sciences (ICSSES), Tumakuru, India, 2023

You might also like