Yield Prediction Using Machine Learning
Yield Prediction Using Machine Learning
Department of Computer Science and Engineering, Meenakshi Sundararajan Engineering College, Anna University,
Chennai, India.
Publication history: Received on 13 May 2022; revised on 16 June 2022; accepted on 18 June 2022
Abstract
Agriculture is the foundation of many countries' economies, particularly in India and Tamil Nadu. The young generation
who are new to farming may confront the challenge of not understanding what to sow and what to reap benefit from.
This is a problem that has to be addressed, and it is one that we are addressing. Predicting the proper crop and
production will aid in making better decisions, reducing losses and managing the risk of price fluctuations. The existing
system is not deployed, unlike ours, which is done by applying classification and regression algorithms to calculate crop
type recommendations and yield predictions. Agricultural industries must use machine learning algorithms to
anticipate the crop from a given dataset. The supervised machine learning technique is used to analyse a dataset in
order to capture information from multiple sources, such as variable identification, uni-variate analysis, bi-variate and
multi-variate analysis, missing value treatments, and so on. A comparison of machine learning algorithms was
conducted in order to identify which algorithm was more accurate in predicting the best harvest. The results show that
the proposed machine learning algorithm technique has the best accuracy when comparing entropy calculation,
precision, Recall, F1 Score, Sensitivity, Specificity, and Entropy.
We have ensured that our proposed system accomplishes its job effectively by projecting the yield of practically all types
of crops grown in Tamil Nadu, relieving some of the burden from their shoulders as they enter a new business.
Keywords: Supervised Machine Learning Approach; Classification and Regression Models; Precision; Linear
Regression; Logistic Regression; Decision Tree and Random Forest
1. Introduction
Agriculture is an important source of income for many people in underdeveloped countries. Several technologies,
conditions, practices, and civilizations have all influenced agricultural expansion in recent years. Furthermore, the use
of information technology may alter the state of decision-making, allowing farmers to produce the best results. Data
mining techniques connected to agriculture are employed in the decision-making process. The process of extracting the
most important and relevant information from a large number of datasets is known as data mining. As agriculture
involves a variety of data, such as soil data, crop data, and weather data, we now employ a machine learning approach
designed for crop or plant yield prediction. Machine learning techniques are efficiently used to propose a crop
recommendation and yield prediction system.
Corresponding author: Divya Lakshmi R
Department of Computer Science and Engineering, Meenakshi Sundararajan Engineering College (Affiliated to Anna University)
Chennai, India.
Copyright © 2022 Author(s) retain the copyright of this article. This article is published under the terms of the Creative Commons Attribution Liscense 4.0.
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
2. Related work
453
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
3. Proposed system
In our proposed crop recommendation and yield prediction system, we employed various machine learning techniques.
Machine learning a branch that focuses on the use of data and algorithms to imitate the way that humans learn, gradually
improving its accuracy, from that we have used classification and regression algorithms to predict the crop type and
yield produced Quintal/Hectare respectively [1]. The inputs would be state name, district name, season, area, rainfall,
average humidity, and mean temperature. After data preprocessing and data visualization methods, a separate dataset
with past data, other than user input, is given to do the training and testing.
The classification algorithms and regression algorithms, three algorithms each are used to find the best accuracy
calculated and the results are displayed to the user. The classification algorithms, such as logistic regression, decision
tree, and random forest, weigh the input features so that the output separates one class into positive values and the
other into negative values, while the regression algorithms, such as linear regression, decision tree, and random forest,
predict the output values based on input features from the data fed into the system. The best accuracy is ranked based
on the accuracy computed by each of those algorithms, and the results are displayed to the user.
The output is then displayed using flask, a small and lightweight Python web framework that provides essential tools
and capabilities to make web application development simple. We prepared separate model files for the algorithms that
produced the best results for crop type and yield prediction, and used those to display the results in Flask.
4. Module description
Python programming and a variety of libraries is used to build the system. The flask library is used to create the
frontend. The model is trained and tested using a variety of machine learning models. We utilize a supervised machine
learning approach because we have a labeled dataset with a set of crop cultivation features.
The dataset is first gathered. The data is then preprocessed to make it easier to use in training. The data is divided into
two categories: training and testing. Each machine learning model is trained using training data. The accuracy of each
model is then calculated and compared using testing data to discover the best model for the dataset. A user interface is
built which will display the prediction using the stored training data.
The first step in data preprocessing is data cleaning, which involves locating duplicate values in the dataset, deleting
null values, and removing unwanted values. The data is then categorized before being encoded into numbers so that the
454
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
model can understand and extract valuable information, as machine learning algorithms only take numeric variables. A
categorical data encoding approach is employed for label encoding in this operation.
A test harness data is created before selecting an appropriate machine learning model. The test harness is the data
against which an algorithm is trained and tested, as well as the performance metric that will be used to evaluate its
performance. The dataset is separated into training and testing datasets for this purpose. 70% of the time is used for
training and 30% for testing. The model will be trained using both input and output data. The input and output of testing
data will be determined by prediction.
For crop recommendation classification supervised machine learning algorithms are used as we can categorize the
output into classes. Classification models have the task of approximating the mapping function from input variables to
discrete output variables. The main goal is to identify which class/category the new data will fall into. For crop yield
prediction regression supervised machine learning algorithms are used as the prediction gives us an output value which
is continuous [4]. Regression is a supervised learning technique which helps in finding the correlation between variables
and enables us to predict the continuous output variable based on the one or more predictor variables.
Using resampling methods like cross validation, it is estimated for how accurate each model may be on unseen data.
Then, using various performance metrics the performance of each algorithm is evaluated. A classification report is
generated by finding precision, recall, F1 score [2]. By comparing the results the best algorithm that can be used for this
model is found.
455
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
6. Results
456
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
457
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
7. Conclusion
A crop recommendation and yield prediction system has been developed successfully using Supervised Machine
Learning Approach. The analysis began with data preprocessing and cleaning, followed by exploratory analysis using
an agricultural dataset. After that, we used the dataset to train multiple machine learning models, and made various
evaluation processes to find the best algorithm. As a result of evaluation, we use Decision tree classifier for crop
recommendation and Random Forest regressor for yield prediction as they give best accuracies. Our trained algorithm
can predict crops based on the specified characteristics, as well as crop yield rate. As this system will cover the widest
range of crops, farmers will be able to learn about crops that have never been farmed before and will be able to see a
list of all possible crops, which will aid them in deciding which crop to plant. We intent to help people who are planning
to invest in farming without any prior knowledge about farming and how much they can make a profit out of it. It is also
to help people who are new to farming and try to make their way in learning the practice that has been followed for
generations. Thus, our approach can assist farmers in Tamil Nadu, particularly newcomers, in deciding which crop to
produce by predicting crop and yield based on local climatic circumstances.
The proposed system is developed as a website. In future we may try to develop our system as a mobile application
which makes the user use this application even more user friendly. As a future enhancement we may try to train the
model using neural network algorithms. In our system we are not using neural networks as with the dataset now we
are having good accuracy results with machine learning algorithms. We expect when increasing the features of input
458
World Journal of Advanced Research and Reviews, 2022, 14(03), 452–459
and increasing the size of the dataset, training the model with neural networks would produce even more efficient
results.
Acknowledgments
This research work is done by Anusree M, Divya Lakshmi R and Swetha U under the supervision of Sundari V, under the
Department of Computer Science and Engineering, Meenakshi Sundararajan Engineering College, Chennai, India.
References
[1] Jeevan Nagendra Kumar Y, V Spandana, VS Vaishnavi, K Neha, VGRR Devi. Supervised Machine learning Approach
for Crop Yield Prediction in Agriculture Sector. 5th International Conference on Communication and Electronics
Systems. 2020;736-741
[2] Mariammal G, A Suruliandi, SP Raja, E Poongothai. Prediction of Land Suitability for Crop Cultivation Based on
Soil and Environmental Characteristics Using Modified Recursive Feature Elimination Technique With Various
Classifiers. IEEE Transactions on Computational Social Systems. 2021;8(5):1132-1142
[3] Mehedi Hasan Md, Muslima Tuz Zahara, Mahamudunnobi Sykot, Arafat Ullah Nur, Mohd Saifuzzaman, Rubaiya
Hafiz. Ascertaining the Fluctuation of Rice Price in Bangladesh Using Machine Learning Approach. 11th
International Conference on Computing, Communication and Networking Technologies. 2020;1-5
[4] Rakesh Kumar, MP Singh, Prabhat Kumar, JP Singh. Crop Selection Method to maximize crop yield rate using
machine learning technique. International Conference on Smart Technologies and Management for Computing,
Communication, Controls, Energy and Materials. 2015;138-145.
[5] Shilpa Mangesh Pande, Prem Kumar Ramesh, Anmol Anmol, BR Aishwarya, Karuna Rohilla, Kumar Shaurya. Crop
Recommender System Using Machine Learning Approach. 5th International Conference on Computing
Methodologies and Communication. 2021;1066-1071.
459