Development of Machine Learning Based Models For Multivariate Prediction of Wheat Crop Yield in Uttar Pradesh, India
Development of Machine Learning Based Models For Multivariate Prediction of Wheat Crop Yield in Uttar Pradesh, India
2, October 2023
Development of Machine Learning based Models for Multivariate Prediction of Wheat Crop
Yield in Uttar Pradesh, India
Sukirti1. Kamal Pandey2*, Abhishek Danodia3, . Harish Chandra Karnatak2
1Indian Institute of Remote Sensing (IIRS), Dehradun
2Department of Geoweb Services, IT and Distance Learning, IIRS, ISRO, Dehradun
3 Agriculture and Soils Department, IIRS, Dehradun
DOI: https://doi.org/10.58825/jog.2023.17.2.70
Abstract: The consequences of climate change have a substantial impact on agricultural crop production and
management. Predicting or forecasting crop yields well in advance would help farmers, agriculture corporations and
government agencies manage risk and design suitable crop insurance plans. Ground survey is the traditional way of
determining yield, which is subjective, time-consuming, and expensive. While Machine Learning (ML) techniques make
yield prediction less expensive, less time taking and more efficient. In this study, thirteen years of meteorological
parameters and wheat yield data (2001-2013) of Uttar Pradesh were used to train and analyze three Machine Learning
Regression models viz. Support Vector Regression, Ordinary Least Squares, and Random Forest. Each model's
performance was assessed using Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error
(RMSE). Results revealed that the Random Forest model with a MAE of 0.258 t/ha, MSE of 0.096 t/ha and RMSE of
0.311 t/ha proved to be the best model in the yield prediction of wheat when results are statistically compared with others.
Researchers and decision-makers can use the findings to estimate pre-harvest yields and to ensure food security.
Keywords: Meteorological parameters, Wheat, Multivariate, Yield prediction, Machine Learning, Random Forest.
forecasting due to the need for remote sensing data across best one for multivariate analysis of yield prediction. The
the farm. This study concentrated on the quantification of study aims to promote wide spread use of ML models in
machine learning algorithms and their practical decision making in a farming sectors for the countries like
application. The method mentioned also considered the India where agriculture has a majoe share in economy.
unpredictable rainfall and temperature in order to obtain a
steady trend. On the basis of mean absolute error, the 2. Materials and Methods
results of different algorithms were compared. The most
accurate regressor for predicting yield was found to be the 2.1 Study Area
Random Forest Regressor. A sequential model called Uttar Pradesh (UP) is the fourth largest state of India with
Simple Recurrent Neural Network performed better at an area of 240,928 km2. UP is the most populous state of
forecasting rainfall than the LSTM for forecasting the India. It is located between latitude 24º to 31º North and
temperature (Nigam et. al., 2019). longitude 77º to 84º East (Figure 1).
In the field of crop yield analysis, machine learning (ML) Approximately 47% of the population is directly
is a new topic of research. By providing inputs for growing dependent on agriculture for their livelihood, and climate
the best possible crop and anticipating the yields, ML has is the primary factor affecting the production. Given the
the potential to transform agriculture (Nigam et. al., 2019) size of the state's geographic area and the access to the
to new dimension. In order to help farmers for selecting a fertile Indo-Gangetic plains, UP makes a considerable
suitable crop for cultivation to obtain maximum yield, ML contribution to the country's food security. The state
model takes into consideration parameters like produces about 12% of India's rice and 28% of it is wheat.
temperature, rainfall, area, etc. The ML based approaches Additionally, a significant amount of sugarcane is
has the potential to improve the expanding agricultural produced, making up 44% of the nation's total output
industry of countries like India and, taken together, raise (Gulati et.al., 2021). The major crops of the state are rice,
the living standard of farmers. The objectives of the wheat, maize, sugarcane, chickpea and pigeon pea. About
presented study are in line with the above argument viz. (i) 24% of the state's agricultural area is used to grow wheat
To develop machine learning models to predict yield of throughout the state.. On an average the total geographical
wheat for the state of Uttar Pradesh, and (ii) To compare area under wheat crop is 9730.60 ha, total production is
the performance of different models and to find out the 32799.71 tons and total yield is 3371 kg/ha.
212
Journal of Geomatics Vol. 17, No. 2, October 2023
213
Journal of Geomatics Vol. 17, No. 2, October 2023
214
Journal of Geomatics Vol. 17, No. 2, October 2023
2.4.3 Support Vector Regression R2 value associated with the SVR model indicated a
A well-known machine learning approach called Support weaker correlation and less accurate predictions. These
Vector Machine (SVM) is widely utilized in both findings highlight the superior performance of the RF
classification and regression. The Support Vector model in predicting crop yield, while underscoring the
Regression (SVR) aims to fit the best line within a limitations of SVR in this particular context.
threshold value, in contrast to other regression models that
aim to minimize the difference between the actual and
predicted value. Equation (2) is the generic representation
for the line in support vector regression:
y= wx+b (2)
215
Journal of Geomatics Vol. 17, No. 2, October 2023
4. Conclusion
Acknowledgements:
Authors express their sincere thanks to Copernicus
Figure 5. Scatter plot of Predicted yield against (https://scihub.copernicus.eu/) for providing satellite data
Observed Yield for (top) Random Forest, (middle) at no cost. The authors are also thankful to the Indian
Ordinary Least Squares. And (bottom) Support Vector Institute of Remote Sensing for providing the laboratory
Regression facilities for the completion this work.
Table 1. Prediction Performance of Different Competing Interests: We, the authors declare that we
Algorithms have no competing interests.
Algorithm MAE MSE RMSE
(t/ha) (t/ha) (t/ha) References
Random Forest 0.258 0.096 0.311 Arora N.K. (2019), “Impact of climate change on
Support Vector 0.439 0.304 0.552 agriculture production and its sustainable solutions,”
Regression Environ. Sustain., vol. 2, no. 2, pp. 95–96, doi:
Ordinary Least 0.312 0.156 0.395 10.1007/s42398-019-00078-w.
Squares
Barman S. (2020), “The Political Economy of Food
Security in India: Evolution and Performance,
Cabas et al 2010 concluded in their study that the non-
International Journal of management, 11(12), pp 1156-
climatic variables had a relatively minor impact on the
1162”
yield distribution, indicating that climatic factors should
dominate this relationship (Cabas et. al., 2010). A system Bloomer C. and G. Rehm (2014), “Using principal
to forecast agricultural yield based on historical data has component analysis to find correlations and patterns at
also been proposed by Bondre and Mahagaonkar (Bondre diamond light source,” IPAC 2014 Proc. 5th Int. Part.
and Mahagaonkar 2019). They used agriculture data and Accel. Conf., pp. 3719–3721
216
Journal of Geomatics Vol. 17, No. 2, October 2023
Bondre D. and S. Mahagaonkar (2019), “Prediction of crop Jørgensen S.E. (1994). “Models as instruments for
yield and fertilizer recommendation using machine combination of ecological theory and environmental
learning algorithms,” International Journal of Engineering, practice,” Ecol. Modell., vol. 75–76, no. C, pp. 5–20, doi:
Applied Science and technology vol. 4(5), pp. 371–376 10.1016/0304-3800(94)90003-5.
Brinkhoff J. and A. J. Robson (2021). “Block-level Karthikeyan L., I. Chawla and A.K. Mishra (2020). “A
macadamia yield forecasting using spatio-temporal review of remote sensing applications in agriculture for
datasets,” Agric. For. Meteorol., vol. 303, doi: food security: Crop growth and yield, irrigation, and crop
10.1016/j.agrformet.2021.108369. losses,” J. Hydrol., vol. 586, 2020, doi:
10.1016/j.jhydrol.124905.
Cabas J., A. Weersink and E. Olale (2010). “Crop yield
response to economic, site and climatic variables,” Clim. Khosla E., R. Dharavath and R. Priya (2020). “Crop yield
Change, vol. 101, no. 3, pp. 599–616, 2010, doi: prediction using aggregated rainfall-based modular
10.1007/s10584-009-9754-4. artificial neural networks and support vector regression,”
Environ. Dev. Sustain., vol. 22, no. 6, pp. 5687–5708, doi:
Chatziantoniou A., G. P. Petropoulos and E. Psomiadis
10.1007/s10668-019-00445-x.
(2017). “Co-Orbital Sentinel 1 and 2 for LULC mapping
with emphasis on wetlands in a mediterranean setting Madhusudan L. (2015). “Agriculture role on
based on machine learning,” Remote Sens., vol. 9, no. 12, Indian..Economy. - Business and economics journal, vol
6, no. 4
Cunha R.L.F, B. Silva and M. A. S. Netto (2018). “A
scalable machine learning system for pre-season Nigam A., S. Garg, A. Agrawal and P. Agrawal (2019)
agriculture yield forecast,” Proc. - IEEE 14th Int. Conf. “Crop yield prediction using machine learning
eScience, e-Science pp. 423–430, doi: algorithms,” Fifth International Conference of Image
10.1109/eScience.2018.00131. Information Processing. Shimla, 125-130
Feizizadeh B., D. Omarzadeh, M. Kazemi Garajeh, T. Nihar A., N. R. Patel and A. Danodia, (2022). “Machine-
Lakes and T. Blaschke (2023). “Machine learning data- Learning-Based Regional Yield Forecasting for Sugarcane
driven approaches for land use/cover mapping and trend Crop in Uttar Pradesh, India,” J. Indian Soc. Remote Sens.,
analysis using Google Earth Engine,” J. Environ. Plan. vol. 50, no. 8, pp. 1519–1530, doi: 10.1007/S12524-022-
Manag., vol. 66, no. 3, pp. 665–697, doi: 01549-0.
10.1080/09640568.2021.2001317.
Pantazi X.E., D. Moshou, T. Alexandridis, R. L. Whetton
Gulati A., P. Terway and S. Hussain (2021) “Performance and A. M. Mouazen (2016). “Wheat yield prediction using
of agriculture in Uttar Pradesh”, In: A. Gulati, R. Roy and machine learning and advanced sensing techniques,”
S. Saini (eds) Revitalizing Indian Agriculture and Comput. Electron. Agric., vol. 121, pp. 57–65, doi:
Boosting Farmer Income. Indian Studies in Business and 10.1016/j.compag.2015.11.018.
Economics, Springer
Poudel S. and R. Shaw (2016). “The relationships between
Haque F.F., A. Abdelgawad, V. P. Yanambaka and K. climate variability and crop yield in a mountainous
Yelamarthi (2020). “Crop Yield Analysis Using Machine environment: A case study in Lamjung District, Nepal,”
Learning Algorithms,” IEEE World Forum Internet Climate, vol. 4, no. 1, doi: 10.3390/cli4010013.
Things, WF-IoT 2020- Symp. Proc., doi: 10.1109/WF-
Schwalbert R.A., T. Amado, G. Corassa, L.P. Pott, P.V.V.
IoT48130.2020.9221459.
Prasad and I.A. Ciampitti (2020). “Satellite-based soybean
Ienco D., R. Interdonato, R. Gaetano and D. Ho Tong yield forecast: Integrating machine learning and weather
Minh (2019). “Combining Sentinel-1 and Sentinel-2 data for improving crop yield prediction in southern
Satellite Image Time Series for land cover mapping via a Brazil,” Agric. For. Meteorol., vol. 284, doi:
multi-source deep learning architecture,” ISPRS J. 10.1016/j.agrformet.2019.107886.
Photogramm. Remote Sens., vol. 158, pp. 11–22, doi:
Sharma V., D. R. Rudnick and S. Irmak, (2013).
10.1016/j.isprsjprs.2019.09.016.
“Development and evaluation of ordinary least squares
Jaafar H. and R. Mourad (2021). “Gymee: A global field- regression models for predicting irrigated and rainfed
scale crop yield and et mapper in google earth engine maize and soybean yields,” Trans. ASABE, vol. 56, no. 4,
based on landsat, weather, and soil data,” Remote Sens., pp. 1361–1378, doi: 10.13031/trans.56.9973.
vol. 13, no. 4, pp. 1–30, doi: 10.3390/rs13040773.
Shetty S., P.K. Gupta, M. Belgiu and S. K. Srivastav
Jamali A. (2019). “Evaluation and comparison of eight (2021). “Assessing the effect of training sampling design
machine learning models in land use/land cover mapping on the performance of machine learning classifiers for land
using Landsat 8 OLI: a case study of the northern region cover mapping using multi-temporal remote sensing data
of Iran,” SN Appl. Sci., vol. 1, no. 11, doi: 10.1007/s42452- and google earth engine,” Remote Sens., vol. 13, no. 8,
019-1527-8. doi: 10.3390/rs13081433.
Jeong J.H, J.P. Resop, N.D. Mueller, D.H. Eleisher, K. Virnodkar S.S., V. K. Pachghare, V. C. Patil and S. K. Jha
Yun, E.E. Butler, D.J. Timlin, K. Shim, J.S. Gerber, V.R. (2020). “Remote sensing and machine learning for crop
Reddy and S. Kim (2016). “Random forests for global and water stress determination in various crops: a critical
regional crop yield predictions,” PLoS One, vol. 11(6), review,” Precis. Agric., vol. 21, no. 5, pp. 1121–1155, doi:
doi: 10.1371/journal.pone.0156571. 10.1007/s11119-020-09711-9.
217