Crop Yield Prediction Using Machine Learning A Pra
Crop Yield Prediction Using Machine Learning A Pra
Pragmatic Approach
Rajswee Surana
Research Article
DOI: https://doi.org/10.21203/rs.3.rs-4575893/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Abstract: The agricultural sector is the major driver of revenues in India. or-
ganic, economic, and seasonal factors all have got a bearing on an agricultural
producer`s manufacturing.
Accurate crop yield prediction (CYP) is required due to the agricultural indus-
try`s rapid innovation and liberalized market economy. Accurate prediction is
greatly aided by the chosen characteristics and machine learning (ML) techniques
Any ML Algorithm’s performance may be enhanced by using a unique set of
features from the same training dataset. This study assesses the key characteris-
tics of an accurate crop yield prediction. For greater accuracy, it uses machine
learning (ML) methods like Random Forest (RF), Adaboost, Gradient Boost, and
Support Vector Machine (SVM). The agriculture dataset has 2201 instances in it.
80% of them are randomly chosen for the model`s training, while 20% are used
to test the model`s predictive power. The results show that the Random Forest
approach gets the highest level of accuracy. The goal of this paper is to predict
crop yields. Using different Machine Learning Algorithms so that farmers can
make their yields higher.
1. INTRODUCTION
Horticulture is vital to the economic growth of a nation such as India. Given that it
supplies an avenue of money and work for the rural poor. It's an essential resource of
nourishment to people worldwide. Production of crop needs is additionally rising with
time. It makes up around 20% of India's GDP. They have accumulated understanding
through the years. Due to obsolete techniques, producers are ignorant of their stake in
the contemporary horticulture business. As a result of this, agricultural businesses
endure. Crop selection is an important aspect of farming planning. Producers'
expenditures decrease when they're given the ability to obtain reliable data on the best
products for their land and season. Developing a model for prediction entails collecting
knowledge from trustworthy sources, as this impacts the model's predictability Artificial
intelligence (AI) has a subfield called machine learning that enables computers to learn
from data without explicit programming. Because it can find patterns and correlations in
2
enormous volumes of data and base forecasts on these relationships, it is the perfect tool
for predicting agricultural productivity.
The use of machine learning enables entities to generate educated decisions through the
examination of future trends and features. The dichotomy is established by recognizing,
assessing, and organizing items and opinions into particular categories or "subpopula-
tions." Using pre-categorized instructional datasets, machine learning programs employ
various methods to recognize upcoming data. Classification algorithms in machine
learning utilize input data to foresee the chance that the subsequent information will fall
within one of the predetermined classes.
The suggested model is more accurate than the current ones, as evidenced by the fact
that the forecast accuracy of earlier accuracy models was lower than that of the model
now being presented in this paper.
2. LITERATURE REVIEW
The Multiple Linear Regression strategies and the Density-based clustering tech-
nique were employed in the analysis phase of the region-specific crop production study,
as stated by the researcher in the study's publication [1]. even though these designs have
been evaluated in all of Andhra Pradesh's districts, only the East Godavari district was
consulted in the evaluation process. Concerning the East Godavari District sample data,
the exact amount as well as the associated calculated value were determined via the
Multiple Linear Regression technique.
Data analysis has numerous applications in farming. The scientists [2] looked at the
efficacy of information in the classification algorithms J48, Random Forest, and Ada-
Boost, and Using a variety of rating criteria, such as the false positive rate, actual positive
rate, precision, ROC area, and F-Measure, Naive Bayes is used. It facilitates the creation
of an effective crop forecast system.. The results show that J48 is the best prediction
technique for the decision tree, with an accuracy of 89.33% on the data that has been
provided set, while Random Forest and Adaboost yield reliability with values of 85.66%
and 82.66%, correspondingly.
Researchers [3] uncovered that owing to the rapid expansion of science and technology
and associated fields, quasi-technological fields have become simpler. Machine learning
and agriculture are two of the most stimulating study fields that are gaining popularity
among researchers. As a consequence, limitations on machine learning applications for
farming are also tackled. This limitation causes the identification of more potential ac-
complishments for the field in question's swift development.
Researchers [4] aimed to discover further the random forest method's handling of region-
specific agricultural yield predictions. In that particular endeavour , 80 percent of the
data serves for instruction, whereas the rest, or twenty percent, is utilized for assessment.
Following effective testing and training, a model's preciseness gets assessed.
3
Employing prior statistics, the authors [5] constructed a robust harvest projection
framework. Datasets undergo evaluation utilizing Xarray functions to figure out the crop
depending on geography and seasonality utilizing a trend-matching strategy. Producers
may employ the advised methodology for choosing crops that are suitable depending on
the time of the year and region. Ranchers are going to succeed by minimizing waste and
broadening the net production of crops.
Considering these scientists [6], all of the machine learning (ML) methods beneath eval-
uation could potentially utilize for assessing the productivity of agriculture. Naive Bayes
models had the poorest precision, 72.33%, whereas the kernel neural network while ran-
dom forest modelling had the highest, 88.67% and 94.13%, correspondingly. Regarding
exactness, ANN anticipated the highest benefit—99.94%—while Logistic Regression
anticipated the weakest—24.17%. Apart for the naive Bayes model, each classification
system considered predicts a recalled admire of more than 90%. Logistic regression had
the highest rate of false positives and the smallest actual negative percentage, while Na-
ive Bayes got the overall greatest incorrect negative incidence. ANN and KNN had the
greatest f-scores, having characteristics around 99.78% & 80.72%, correspondingly,
along with the strongest clarity.
Linear Regression surpassed K Nearest Neighbour concerning predictive power and ac-
curacy, as reported by the researchers [7]. The regression technique's capacity utilization
for foreseeing yields was investigated. The data were utilized as model inputs. fortu-
nately substitute elements including the environment, used for farming practices, or soil
characteristics were taken into consideration in the equation's creation, specifications
like as year, crop, area, and quantity produced (in tonnes) might have offered deeper
estimation influence, as well as linear regression techniques continued to offer appropri-
ate forecasting preciseness.
3. PROPOSED FRAMEWORK
The advocated mechanism in Fig.1 intends to help farmers nurture harvests for higher
volumes. The crops designated under the endeavour were essential in the stipulated
locale. Crop preference is an important aspect of agrarian strategy. Farmers' declines
diminish when they have access to precise details on the best harvest for their land and
season. throughout the past few decades, a yield from agriculture data set has been
compiled from a variety of evidence.
4
Decision Tree
A decision tree is an overarching learning technique that can be used for both
classification and regression problems, however, it is most frequently used for
classification. It is a tree-structured classification where each leaf node denotes the
outcome and nodes within it contain data properties, decision-making rules, and
pathways.
The expected value of each result can be determined as follows:
Expected value (EV) = (First possible outcome x Likelihood of outcome) + (Second
possible outcome x Likelihood of outcome) - Cost
Adaboost Algorithm
It integrates multiple classifiers to increase the precision of classifiers. AdaBoost is an
incremental collection composition approach. The AdaBoost classifier builds an
efficient predictor by combining many negligible-performing classification algorithms,
resulting in a hot-accuracy strong classifier.
The percentage of correctly identified positive cases among all projected positive cases
is the implicit definition of precision. It concerns how well your model can forecast the
actual positives. It emphasises Type I mistake.
Recall = TP / (TP + FN)
It's the proportion of accurately detected positive cases among all positive instances.
wherein TP is interpreted as when each instance is presupposed to be advantageous but
attains the age of out to be optimistic. When scenarios show up to be either advanta-
geous or detrimental in reality, FP emerges. TN develops when events are predicted to
be unfavorable but inevitably turn out to be unfavorable FN arises when domains that
were portrayed to be negative ultimately prove to be affirmative.
4.1 Findings
A. When we assessed the data set via a cross-validation model with 5 folds of numbers
as shown in Fig.5, the precision, AUC, F1, CA, and recall were as follows:
B. When we reviewed the dataset using a random sampling model as shown in Fig.6, the
AUC, F1, CA, and Recall were as follows:
8
C. the values of different parameters in the dataset , how they are related to each other
are shown below:
5. CONCLUSION
The suggested structure is being created using machine learning to help ranchers analyze
agricultural losses brought on by a lack of knowledge about farms under varied soil and
weather conditions. In our study, we used a range of assessment metrics, such as false
positive rate, true positive rate, recall, precision, ROC area, and F-1 score, to compare
the performance of the classification algorithms such as Decision Tree, Adaboost, SVM,
Gradient Boost, and Random Forest.
It supports the creation of an effective crop forecasting system. The generated findings
show that Random Forest is the best predictor method, with a 99.93% accuracy rate on
the supplied data. An extensive dataset, 5-fold cross-validation, and a Random
Sampling model are utilized in this comparative crop prediction study to demonstrate
the predictive ability of the employed data mining techniques. The exactness has been
verified to be 99%. It has plenty of room for expansion and can be put into practice and
coupled with a diversified and multifaceted application. Ranchers should be enlightened.
The improvement in the field of agriculture is going to be extremely helpful, which will
assist producers in producing crops.
REFERENCES
[1] Ramesh, D., & Vardhan, B. V. (2015). Analysis of crop yield prediction using data mining
techniques. International Journal of Research in Engineering and Technology, 4(1), 47-473.
[2] Patel, H., & Patel, D. (2016). A comparative study on various data mining algorithms with
special reference to crop yield prediction. Indian Journal of Science and Technology, 9(22), 18.
[3] Madhubhashini, D. A. P. Data Mining for Smart Agriculture research, 64(2), 394-398.
12
[4] Kamath, P., Patil, P., Shrilatha, S., & Sowmya, S. (2021). Crop yield forecasting using
data mining. Global Transitions Proceedings, 2(2), 402-407.
[5] Kedlaya, A., Sana, A., Bhat, B. A., Kumar, S., & Bhat, N. (2021). An efficient algorithm
for predicting crops using historical data and pattern-matching techniques Global Transitions
Proceedings, 2(2), 294-298.
[6] Pandith, V., Kour, H., Singh, S., Minhas, J., & Sharma, V.(2020). Performance evaluation
of machine learning techniques for mustard crop yield prediction from soil analysis. Journal of
scientific research, 64(2), 394-398.
[7] Van Kloppenburg, T., Kassahun, A., & Catal, C. (2020). Crop yield prediction using ma-
chine learning: A systematic literature review. Computers and Electronics in Agriculture, 177,
105709
[8] Reddy, D. J., & Kumar, M. R. (2021, May). Crop yield prediction using a machine learning
algorithm.
In 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS)
(pp. 1466-1470). IEEE
[9] Mishra, S., Paygude, P., Chaudhary, S., & Idate, S. (2018, January). Use of data mining in
crop yield prediction. In 2018 2nd International Conference on Inventive Systems and Control
(ICISC) (pp. 796-802). IEEE.
[10] Surya, P., & Aroquiaraj, I. L. (2018). Crop yield prediction in agriculture using data
mining predictive analytic techniques. International Journal of Research and Analytical Re-
views, 5(4), 783-787.
[11] Bondre, D. A., & Mahagaonkar, S. (2019). Prediction of crop yield and fertilizer recom-
mendation using machine learning algorithms. International Journal of Engineering Applied Sci-
ences and Technology, 4(5), 371-376.
[12] Pant, J., Pant, R. P., Singh, M. K., Singh, D. P., & Pant, H. (2021). Analysis of agricul-
tural crop yield prediction using statistical techniques of machine learning. Materials Today:
Proceedings, 46, 10922-10926.
[13] Kale, S. S., & Patil, P. S. (2019, December). A machine learning approach to predict crop
yield and success rate. In 2019 IEEE Pune Section International Conference (PuneCon) (pp. 1-
5). IEEE.
[14] Oikonomidis, A., Catal, C., & Kassahun, A. (2023). Deep learning for crop yield predic-
tion: a systematic literature review. New Zealand Journal of Crop and Horticultural Sci-
ence, 51(1), 1-26.
[15] Wu, J., & Zhao, F. (2023). Machine learning: An effective technical method for future
use in assessing the effectiveness of phosphorus-dissolving microbial agroremediation. Frontiers
in Bioengineering and Biotechnology, 11.