0% found this document useful (0 votes)
56 views5 pages

Crop Yield Estimation ML

Uploaded by

Abd 17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views5 pages

Crop Yield Estimation ML

Uploaded by

Abd 17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA)

Galgotias University, Greater Noida, UP, India. Oct 30-31, 2020

Crop Yield Estimation in India Using Machine


Learning
Ms Kavita Pratistha Mathur
Information Technology Information Technology
Manipal University Jaipur Manipal University Jaipur
Jaipur, India Jaipur, India
[email protected] [email protected]

Abstract— Agriculture in India is significant economic Neural Network [9] [6] [10], and Deep Learning [11] [12]. This
support. The population growth is the major challenge for food research estimates the crop yield for India using data from 1950
security. The population growth makes a rise in demand which to 2018. The prediction is made for five crops which are Rice,
requires farmers to produce more from the same agriculture land Wheat, Jowar, Bajra, Tobacco, and Maize using parameters
in respect to increase the supply. Technology can help farmers to including the area used for the crop sowing, production, Yield,
produce more with the help of crop yield prediction. The main aim and Area under irrigation. The prediction is attained using
of this paper is to predict crop yield using area, yield, production, decision tree and random forest.
and area under irrigation. Four machine learning techniques
Decision Tree, Linear Regression, Lasso regression, and Ridge II. RELATED WORK
Regression have been applied to estimate the crop yield. Cross
validations methods, for validation, mean absolute error, mean Crop yield estimation can be attained by implementing machine
squared error, and root mean squared error, were used to learning techniques. Multiple Linear Regression model was
validate. The Decision tree outperforms other machine learning used to predict the crop yield using the Dataset which includes
techniques. total cultivation area, water resources for irrigation (tanks and
wells), canals length, and average maximum temperature [3].
Keywords— crop yield, decision tree, LASSO regression, Linear Another study claimed that the computational model developed
regression, machine learning. was producing better results than Lasso, Shallow neural
I. INTRODUCTION network, regression tree, Deep Neural network approach was
used to design the model. Root mean square error is 12% of the
In India, agriculture is the primary source of food for the average yield and 50% of the standard deviation for the dataset
large population as well as significant economic support. The
validation using predicted weather data [11]. [13] Conducted
rapid increase in the population of India and vital climate
research on four objectives as listed, First, study the artificial
variations food supply and demand chain is required to be
maintained. Many scientific techniques have been immerged neural network model to predict the corn and soybean yields
with agriculture to maintain the balance between demand and under unfavourable climatic conditions, Second, check the
supply of food. The notable variation in the climate creates a estimation capabilities of the model at state, regional and local
problematic situation for Farmers to decide how to be more levels, third, Artificial neural network performance is evaluated
dynamic and sustainable [2]. Agriculture requires more with reference to parameter variation, and fourth, compare the
production from fewer inputs with the support of new developed Artificial neural network model to other multiple
technology, new farming methods, and time-saving products linear regression models. [10] The study intended to use
[3]. Thus, to determine food security problems, Crop yields artificial neural networks to estimate rice production in various
Estimation plays a significant role [4]. district of Maharashtra, India. The data was collected for 27
districts of Maharashtra from the Indian Government's public
Crop yield estimation can be used to help farmers to reduce
the loss of production under unsuitable conditions and increase records. The obtained accuracy was 97.5% with the observed
the production under suitable and favourable condition. Crop parameters were rainfall, minimum, maximum and average
Yield positive prediction is affected by many factors, including temperature, area, production, yield, and reference crop
farmer’s practices, decisions, pesticides, fertilizers, weather evapotranspiration for the years 1998 to 2002, Kharif season
conditions, and market values [5]. Crop yield estimation can be [6]. The research focuses on crop yield estimation of Kharif
done using statistical data of previous year's yields along with crops of Andhra Pradesh's district Vishakhapatnam. Rainfall
the Rainfall, weather, and area-wise production [6]. plays a considerable role in Kharif crop production, so the
authors first predicted Rainfall using modular artificial neural
Machine learning has recently evolved in many fields, networks and further predicted crop yield by using the Rainfall
including Agricultural domain. Many Machine learning
and area data using support vector regression. These two
techniques have been applied to predict the crop yield including
support vector machine [2] [7], Decision tree [8], Artificial methodologies were applied to increase the crop yield. The

978-1-7281-6324-6/20/$31.00 ©2020 IEEE 220

Authorized licensed use limited to: Carleton University. Downloaded on June 01,2021 at 14:33:51 UTC from IEEE Xplore. Restrictions apply.
research work uses Machine Learning algorithms named,
Artificial Neural network, Support Vector Regression, K-
nearest Neighbor and Random Forest to estimated crop yield
with better accuracy. The data used in research is consist of 745
instances where 70% of data are randomly assigned for training
the model, and the remaining 30% is used for testing and
evaluating the final model performance. The final result
indicates that the Random Forest algorithm gains the highest
accuracy [14]. [15] This Research proposes a novel model to
predict the yield of soybean using Long Short Term Memory in
southern Brazil, and Neural Network on satellite and weather
data. The primary goal of this research is to, i) conducts a
comparison study among multivariate OLS linear regression,
Fig. 1. Crop production mean for 1950-2018
random forest and LSTM neural networks on the basis on their
performance. The forecasting is done on soybean data using
Vegetation Indices, Land surface temperature and Rainfall as This study applies Decision Tree, Linear Regression, Lasso
independent variables and ii) Estimate how early this model can regression, and Ridge Regression methodologies to predict the
predict the yield with reasonable accuracy. Among all the crop yield for India. And for validation, mean absolute error [1],
Algorithms Long Short Term Memory performers better for all mean squared error [17] and root mean squared error [18].
the forecasts except DOY 16, For DOY 16 multivariate OLS C. Methods
linear Regression performs better. [2] The paper analyses the
results attained by implementing Sequential minimal 1) Decision Tree
optimization classifier. WEKA tool was used to perform the Decision tree is a non-parametric machine learning
experiment on the data of 27 districts data of Maharashtra, algorithm used for regression and classification problems. The
decision tree algorithm fulfils two major tasks, first, classifying
India. The results obtained from the experiment on the same
the features which are appropriate for every decision, and
Dataset indicated that other techniques performer better than
second, concluding which choice to make based on chosen
Sequential minimal optimization. The validation was done features. Decision Tree algorithm assigns a probability
using Mean Absolute Error, Root Mean Squared Error, Relative distribution to the plausible choice [19]. In decision tree, every
Absolute Error, and Root Relative Squared Error. In respect to node symbolizes a feature, and every branch leads to a decision,
accuracy and quality, BayesNet and Multilayer Perceptron has and the leaf node indicates the final result. For the construction
shown the highest accuracy and better quality, whereas of a decision tree, one feature should be selected as the root node
Sequential minimal optimization has indicated the lowest to start the tree production, and further to complete the tree
accuracy and poor quality. splitting of data is required.
III. MATERIAL AND METHODS 2) Linear Regression
Linear regression is machine learning as well as the
A. Study Region
statistical algorithm. The main aim of linear regression is to
In this study, authors focus on India due to its climate generate mathematical models to describe the relationship
variation from humid to dry in the southern and temperate alpine between two variable. The model assumes a statistical
in northern India. India has a land area of 297319 ha and relationship between input / dependent variable (x) and output
agriculture area of 179721 ha. Six major crops grown in India / independent variable (y). The independent or response
are rice, jowar, maize, bajra, tobacco, and wheat have been variable is calculated from the dependent variable. Another aim
selected to perform this study. India is the top exporter of Rice
of regression is to examine the hypothesis by prediction and
in the world with the export quality of 12,060,844 tones [16].
mathematical explanation [20].
B. Data sources
The Dataset used for the experiment in this research is 3) Lasso regression
originally collected from www.mospi.gov.in and Lasso regression is least Absolute Shrinkage Selection
https://data.gov.in, which is made public by government Operator. In lasso regression value of the parameter controls
authorities. The obtained dataset has the following features: both size and number of the coefficients, with higher values of
rainfall, area, area under irrigation, crop names, seasons, leading to the greater number of covariates to be included in the
production, and yield for the year 1950 to 2018. Figure 1 linear model. In other words, the model shrinks some
represents the crop's production in the last 68 years. coefficients and sets others to 0, and hence tries to retain the
useful features [21].

4) Ridge Regression
Ridge regression is applied to advantage when the predictor
variables are highly collinear. This method is used to analyse
multicollinearity in multiple regression data. This is most

221

Authorized licensed use limited to: Carleton University. Downloaded on June 01,2021 at 14:33:51 UTC from IEEE Xplore. Restrictions apply.
suitable when a dataset contains a higher number of predictor
variables than the number of observation [22].

IV. RESULTS
The Dataset features have a relationship among them,
authors considered crop as a major feature and plotted the
relationship represented in figure 2. The representation indicates
the production and the area allotted to rice is highest. This
satisfies the fact that India is the top exporter of rice to the world.
Area under cultivation of rice and wheat took the maximum
proportions and accounted for 75% of the food grains production
in the country.
(d)
Fig. 2. Relationship between crop and features, (a) Represents relationship
between crop and production (b) Represents Area allotted to crops (c) shows
the area under irrigation and crop relationship (d) rainfall and crop mapping
over the years

The prediction results are shown in Table 1. The research


directs to the conclusion that decision tree produces better
accuracy when compared to other machine learning algorithms.

TABLE I Models comparison among decision tree, linear regression, lasso


regression and ridge regression.
(a)

Accuracy Errors
Models Mean Absolute Root Mean
Error (MAE) Squared Error
(RMSE)
Decision tree 98.62 1.45 2.11
Linear 89.38 5.42 6.27
Regression
Lasso Regression 86.33 6.25 8.85
Ridge 89.53 5.49 6.53
Regression

The decision tree illustrates the performance at country level


with MAE = 1.45, and RMSE = 2.11 (table 1). The scatter plot
of predicted errors of Decision tree, linear regression, lasso and
(b) ridge regression is demonstrated in figure 3.

(a)
(c)

222

Authorized licensed use limited to: Carleton University. Downloaded on June 01,2021 at 14:33:51 UTC from IEEE Xplore. Restrictions apply.
years, researchers have placed many efforts to predict crop
yield production to help farmers. India is the country of villages
and farmers. Technology can help farmers by estimating crop
yield. This research implements machine learning techniques to
predict the crop yield for India. The prediction so far has
revealed that the decision tree performs better for country-level
data. The study highlights the benefits of evolving techniques,
as these techniques are associated with the agricultural dataset.
This is beneficial for the small landholder farmers, from the
prediction farmer can estimate the crop yield for the upcoming
year and can grow the crop in obedience to prediction. The
study can be taken forward by integrating remote sensing data
(b) with the statistical data.
REFERENCES

[1] E. J. Coyle, “Rank order operators and the mean absolute


error criterion,” IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 36, no. 1, pp. 63–76, Jan. 1988.
[2] N. Gandhi, L. J. Armstrong, O. Petkar, and A. K. Tripathy,
“Rice crop yield prediction in India using support vector
machines,” in 2016 13th International Joint Conference on
Computer Science and Software Engineering (JCSSE),
Khon Kaen, Jul. 2016, pp. 1–5.
[3] P. S. Maya Gopal and R. Bhargavi, “Optimum Feature
Subset for Optimizing Crop Yield Prediction Using Filter
and Wrapper Approaches,” Applied Engineering in
(c)
Agriculture, vol. 35, no. 1, pp. 9–14, 2019.
[4] A. Gonzalez-Sanchez, J. Frausto-Solis, and W. Ojeda-
Bustamante, “Predictive ability of machine learning
methods for massive crop yield prediction,” Span J Agric
Res, vol. 12, no. 2, p. 313, Apr. 2014.
[5] R. Kumar, M. P. Singh, P. Kumar, and J. P. Singh, “Crop
Selection Method to maximize crop yield rate using
machine learning technique,” in 2015 International
Conference on Smart Technologies and Management for
Computing, Communication, Controls, Energy and
Materials (ICSTM), Avadi,Chennai, India, May 2015, pp.
138–145.
[6] E. Khosla, R. Dharavath, and R. Priya, “Crop yield
(d) prediction using aggregated rainfall-based modular artificial
Fig. 3. Scatter plot of predicted errors for (a) Decision tree (b) linear neural networks and support vector regression,” Environ
regression (c) Lasso Regression (d) Ridge Regression Dev Sustain, Aug. 2019.
[7] A. Mathur and G. M. Foody, “Crop classification by support
Linear regression and Ridge regression shows better vector machine with intelligently selected training data for
accuracy than Lasso, Linear regression with accuracy 89.38 and an operational application,” International Journal of
Ridge regression with accuracy 89.53. Decision tree provides Remote Sensing, vol. 29, no. 8, pp. 2227–2240, Apr. 2008.
the most accurate estimation of crop yield for India country [8] Kim, Nari and Lee, Yang-Won, “Machine Learning
using statistical data. Machine learning does not interpret much, Approaches to Corn Yield Estimation Using Satellite
which makes it a black box technique. In this study, the machine Images and Climate Data: A Case of Iowa State,” vol. 34,
learnings techniques are applied to predict the crop yield no. 4, pp. 383–390, Aug. 2016.
estimation for India, where Decision tree performed better than [9] I. Nitze, U. Schulthess, and H. Asche, “Comparison of
the other three regression methods. Machine Learning Algorithms Random forest, Artificial
neural network and support vector machine to maximum
V. CONCLUSION likelihood for supervised crop type classification,” Proc. Of
As the population is proliferating, food demand and supply The 4th GEOBIA 35 May 2012.
chain have become challenging to maintain. In the last many

223

Authorized licensed use limited to: Carleton University. Downloaded on June 01,2021 at 14:33:51 UTC from IEEE Xplore. Restrictions apply.
[10] N. Gandhi, O. Petkar, and L. J. Armstrong, “Rice crop [16] “FAOSTAT,” 2017.
yield prediction using artificial neural networks,” in 2016 http://www.fao.org/faostat/en/#rankings/countries_by_com
IEEE Technological Innovations in ICT for Agriculture and modity_exports.
Rural Development (TIAR), Chennai, India, Jul. 2016, pp. [17] Zhou Wang and A. C. Bovik, “Mean squared error: Love
105–110. it or leave it? A new look at Signal Fidelity Measures,”
[11] S. Khaki and L. Wang, “Crop Yield Prediction Using Deep IEEE Signal Process. Mag., vol. 26, no. 1, pp. 98–117, Jan.
Neural Networks,” Front. Plant Sci., vol. 10, p. 621, May 2009.
2019. [18] T. Chai and R. R. Draxler, “Root mean square error
[12] A. Crane-Droesch, “Machine learning methods for crop (RMSE) or mean absolute error (MAE)? – Arguments
yield prediction and climate change impact assessment in against avoiding RMSE in the literature,” Geosci. Model
agriculture,” Environ. Res. Lett., vol. 13, no. 11, p. 114003, Dev., vol. 7, no. 3, pp. 1247–1250, Jun. 2014.
Oct. 2018. [19] D. M. Magerman, “Statistical decision-tree models for
[13] M. Kaul, R. L. Hill, and C. Walthall, “Artificial neural parsing,” in Proceedings of the 33rd annual meeting on
networks for corn and soybean yield prediction,” Association for Computational Linguistics -, Cambridge,
Agricultural Systems, vol. 85, no. 1, pp. 1–18, Jul. 2005. Massachusetts, 1995, pp. 276–283.
[14] M. G. P. S. and B. R., “Performance Evaluation of Best [20] G. A. F. Seber and A. J. Lee, Linear Regression Analysis.
Feature Subsets for Crop Yield Prediction Using Machine John Wiley & Sons, 2012.
Learning Algorithms,” Applied Artificial Intelligence, vol. [21] M. J. A. Chan-Lau, Lasso Regressions and Forecasting
33, no. 7, pp. 621–642, Jun. 2019. Models in Applied Stress Testing. International Monetary
[15] R. A. Schwalbert, T. Amado, G. Corassa, L. P. Pott, P. V. Fund, 2017.
V. Prasad, and I. A. Ciampitti, “Satellite-based soybean [22] S. Chatterjee and A. S. Hadi, Regression Analysis by
yield forecast: Integrating machine learning and weather Example. John Wiley & Sons, 2006.
data for improving crop yield prediction in southern Brazil,”
Agricultural and Forest Meteorology, vol. 284, p. 107886,
Apr. 2020.

224

Authorized licensed use limited to: Carleton University. Downloaded on June 01,2021 at 14:33:51 UTC from IEEE Xplore. Restrictions apply.

You might also like