10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
10 - RA - Machine Learning For Large-Scale Crop Yield Forecasting
Agricultural Systems
journal homepage: www.elsevier.com/locate/agsy
A R T I C L E I N F O A B S T R A C T
Keywords: Many studies have applied machine learning to crop yield prediction with a focus on specific case studies. The
Crop yield prediction data and methods they used may not be transferable to other crops and locations. On the other hand, operational
Machine learning large-scale systems, such as the European Commission’s MARS Crop Yield Forecasting System (MCYFS), do not
Modularity
use machine learning. Machine learning is a promising method especially when large amounts of data are being
Reusability
collected and published. We combined agronomic principles of crop modeling with machine learning to build a
Large-scale crop yield forecasting
machine learning baseline for large-scale crop yield forecasting. The baseline is a workflow emphasizing cor
rectness, modularity and reusability. For correctness, we focused on designing explainable predictors or features
(in relation to crop growth and development) and applying machine learning without information leakage. We
created features using crop simulation outputs and weather, remote sensing and soil data from the MCYFS
database. We emphasized a modular and reusable workflow to support different crops and countries with small
configuration changes. The workflow can be used to run repeatable experiments (e.g. early season or end of
season predictions) using standard input data to obtain reproducible results. The results serve as a starting point
for further optimizations. In our case studies, we predicted yield at regional level for five crops (soft wheat,
spring barley, sunflower, sugar beet, potatoes) and three countries (the Netherlands (NL), Germany (DE), France
(FR)). We compared the performance with a simple method with no prediction skill, which either predicted a
linear yield trend or the average of the training set. We also aggregated the predictions to the national level and
compared with past MCYFS forecasts. The normalized RMSE (NRMSE) for early season predictions (30 days after
planting) were comparable for NL (all crops), DE (all except soft wheat) and FR (soft wheat, spring barley,
sunflower). For example, NRMSE was 7.87 for soft wheat (NL) (6.32 for MCYFS) and 8.21 for sugar beet (DE)
(8.79 for MCYFS). In contrast, NRMSEs for soft wheat (DE), sugar beet (FR) and potatoes (FR) were twice as
much compared to MCYFS. NRMSEs for end of season were still comparable to MCYFS for NL, but worse for DE
and FR. The baseline can be improved by adding new data sources, designing more predictive features and
evaluating different machine learning algorithms. The baseline will motivate the use of machine learning in
large-scale crop yield forecasting.
* Corresponding author.
E-mail addresses: [email protected] (D. Paudel), [email protected] (H. Boogaard), [email protected] (A. de Wit), [email protected]
(S. Janssen), [email protected] (S. Osinga), [email protected] (C. Pylianidis), [email protected] (I.N. Athanasiadis).
https://doi.org/10.1016/j.agsy.2020.103016
Received 9 June 2020; Received in revised form 27 October 2020; Accepted 4 December 2020
Available online 14 December 2020
0308-521X/© 2020 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
D. Paudel et al. Agricultural Systems 187 (2021) 103016
agronomic principles of plant, environment and management in and (iii) reusability. First, our methodology focuses on how to create
teractions (Basso et al. 2013; Chipanshi et al. 2015). Remote sensing features that can explain crop growth and development based on agro
methods rely on satellite imagery to capture the current state of crops nomic principles of crop modeling, and how to apply machine learning
and then to estimate the final yield (Lopez-Lozano et al., 2015). Statis without leaking information from the test set. Second, a modular design
tical models use weather variables and the outputs of the three previous permits the workflow to be improved or extended by adding new data
methods as predictors to derive linear relationships between the pre sources, designing more advanced features and evaluating different
dictors and crop yield (e.g. Bussay et al. 2015). Recent studies have machine learning methods. Third, reusability addresses the trans
combined different methods in innovative ways to build yield fore ferability of the workflow to different crops and countries with small
casting models. For example, Lobell et al. (2015) and Zhao et al. (2020) configuration changes. The results obtained can be a starting point for
used high-resolution remote sensing data and crop modeling to build further optimizations.
statistical models to forecast the actual yield. Similarly, Newlands et al. We tested the machine learning baseline on three countries (the
(2014) developed a probabilistic yield forecasting framework for Can Netherlands (NL), Germany (DE), France (FR)) and five crops (soft
ada using remote sensing, crop modeling, Bayesian inference and sta wheat, spring barley, sunflower, sugar beet, potatoes) using MCYFS
tistical models. (MARSWiki, 2020; EC-JRC, 2020) and Eurostat data (Eurostat, 2020a,
Machine learning takes a data-driven or empirical modeling Eurostat, 2020b). We ran experiments to predict early season and end of
approach to learn useful patterns and relationships from input data season crop yield at NUTS2 or NUTS3 level (see Eurostat (2016), Section
(Willcock et al., 2018) and provides a promising avenue for improving E of Supplement 1). We compared the regional predictions with a simple
crop yield predictions. Machine learning algorithms approximate a method with no prediction skill, which we call the “null” method. The
function that relates features or predictors to labels, such as crop yield. null method either predicted a linear yield trend or the average of the
Similar to statistical models, machine learning algorithms can utilize the training set. We also aggregated the predictions to the national (NUTS0)
outputs of other methods as features. In addition, machine learning al level and compared the results with past MCYFS forecasts.
gorithms have some distinct benefits: they can model non-linear re The remainder of the paper is organized as follows: Section 2 reviews
lationships between multiple data sources (Chlingaryan et al. 2018); related work in the field; Section 3 describes the methodology and the
their performance generally improves when more training data is case studies; Section 4 presents the results; Section 5 discusses our find
available (Goodfellow et al. 2016); and they can become robust to noisy ings and areas for further research; and Section 6 summarizes our con
data by using regularization techniques that help decrease the variance clusions. Supplement 1 provides a brief introduction to MCYFS and
and the generalization error (James et al. 2013; Goodfellow et al. 2016). machine learning, and the workflow details not included in Section 3
Therefore, machine learning could combine the benefits of other (Methodology). Supplement 2 includes a Jupyter notebook implementa
methods, such as crop growth models and remote sensing, with data- tion (see https://jupyter.org/) of the machine learning baseline, a
driven modeling to make reliable crop yield predictions. sample data set and supporting materials for Section 4 (Results) and
Many studies have applied machine learning to predict yields of Section 5 (Discussion).
certain crops in specific locations, but it is unclear whether their data
and methods are transferable to other crops and locations. Some of them 2. Related work
used empirical data collected for specific purposes that may not be
available for other crops or locations (e.g. Pantazi et al. (2016)). Some Four methods or combinations thereof have been commonly used to
others used generally available climate and satellite data, but made crop predict crop yield: (i) field surveys, (ii) crop growth models, (iii) remote
and location-specific design choices that limit their reusability (e.g. Cai sensing, and (iv) statistical models. These methods have their strengths
et al. (2019)). In this paper, we seek to address the need for modular and and weaknesses. Field surveys try to capture the ground truth by means
reusable workflows that would help understand the usefulness of of grower-reported surveys and objective measurement surveys (USDA-
various data sources, predictors or features and machine learning al NASS, 2012). These surveys suffer from declining responses (Schnepf
gorithms for different crops across spatial and temporal settings. Reus 2017), resource restrictions and reliability concerns due to sampling and
able workflows would allow researchers to run repeatable experiments, non-sampling errors (Chipanshi et al. 2015). Process-based crop models
such as early season or end of season predictions, for different crops and simulate crop growth and development by using crop parameters,
countries with standard input data and obtain reproducible results. The environmental conditions and management practices as input. They
models could be improved for specific crops and locations using new apply agronomic principles of crop growth and development that apply
data sources, more advanced features and other optimizations. across space and time (Basso and Liu 2019). However, they do not ac
Large-scale crop yield forecasting systems, such as the MARS Crop count for all yield-reducing factors and have considerable data and
Yield Forecasting System (MCYFS) of the European Commission’s Joint calibration requirements (De Wit et al., 2019). Remote sensing tries to
Research Centre (JRC) and the National Agricultural Statistics Service capture current information about crops by using satellite images.
(NASS) of US Department of Agriculture (USDA), have the infrastructure Remote sensing data are globally available under open data policies and
and historical data to build and assess crop yield prediction models for they do not suffer from human errors (Chipanshi et al. 2015). However,
different crops and locations. However, the operational systems we remote sensing observations only provide indirect measurements of crop
know of do not use machine learning. They build statistical models from yield, namely observed radiance (Dorigo et al., 2007; Jones and
weather observations, field survey results, crop growth model outputs, Vaughan, 2010), and therefore rely on biophysical or statistical models
remote sensing indicators and yield statistics (MARSWiki, 2020; USDA- to convert satellite observations into a yield prediction (e.g. Lopez-
NASS, 2012). Van der Velde and Nisini (2019) evaluated the perfor Lozano et al., 2015). Statistical models use meteorological indicators
mance of MCYFS from 1993 to 2015 and found that there is no signifi and the outputs of the three previous methods as predictors. These
cant improvement in MCYFS performance from 2006 onwards. Machine models estimate the yield trend attributable to technological advance
learning is a promising method especially when a large amount of data is ments in genetics and management (Basso et al. 2013) and fit linear
being collected and made public (Lokers et al. 2016; GODAN 2020; EC- models between predictors and yield residuals (e.g. Bussay et al. 2015).
JRC 2020). A reusable and extensible workflow based on inputs similar They provide reasonable accuracy and explainability but cannot be
to MCYFS would motivate the adoption of machine learning in large- extrapolated to other spatial and temporal contexts (Basso et al. 2013).
scale crop yield forecasting. Machine learning has gained popularity in agricultural applications
We present a machine learning baseline for large-scale early and end due to its success in other fields, such as medicine (e.g. Kang et al.
of season crop yield forecasts. The baseline is a general machine learning (2015)), bioinformatics (e.g. Mackowiak et al. (2015)) and natural
workflow emphasizing three principles: (i) correctness, (ii) modularity, language processing (e.g. Socher et al. (2012)). Recent reviews
2
D. Paudel et al. Agricultural Systems 187 (2021) 103016
(Chlingaryan et al., 2018; Kamilaris and Prenafeta-Boldu, 2018; Liakos information leakage.
et al., 2018) have looked at the applications of machine learning in The need for modularity and reusability in agricultural modeling has
agriculture. Many studies (included in the reviews and others) have been stressed by Janssen et al. (2017) and Holzworth et al. (2014). In the
applied traditional (or shallow) machine learning and deep learning to case of crop yield prediction, modular design makes it possible to run
crop yield prediction. Among applications of shallow methods, Shah experiments to test alternative configurations, such as early or end of
hosseini et al. (2019) built machine learning metamodels from outputs season prediction. Similarly, modularity is crucial to minimize and di
of the APSIM crop model (Holzworth et al., 2015) to predict maize yield agnose unexpected outcomes when one part of the workflow is updated
and nitrogen loss in the US; Jeong et al. (2016) applied Random Forests (Janssen et al. 2017). Reusability has not been a design goal in agri
(Breiman 2001) to predict wheat yield globally and maize and potato cultural system modeling; more emphasis has been placed on the un
yield in the US; and Gonzalez Sanchez et al. (2014) compared the per derlying science (Holzworth et al., 2014). Example applications of
formance of four machine learning algorithms on ten crops in Mexico. machine learning to crop yield prediction show a similar pattern.
Among applications of deep learning, Crane-Droesch (2018) applied Reusability or transferability of methods has not been emphasized. We
semiparametric deep neural networks to predict corn yield in the US; have designed the machine learning baseline focusing on modularity
You et al. (2017) leveraged representation learning ideas to predict and reusability.
soybean yield in the US; and Pantazi et al. (2016) used self-organizing
maps (Von der Malsburg 1973; Kohonen 2001) to predict within-field 3. Methodology
variation of wheat yield in the UK. These examples show that both
shallow and deep methods can predict crop yield. However, they focus We designed a machine learning workflow for crop yield prediction
on optimizing performance for specific case studies. Some studies (e.g. using MCYFS data. We evaluated the workflow by predicting crop yield
Pantazi et al. (2016)) use empirical data collected for a specific location. at NUTS2 or NUTS3 levels for five crops and three countries. For each
Others use generally available data (e.g. You et al. (2017)), but focus on crop and country, we ran experiments to predict early season (30 days
novel methods to improve performance. Some of them cover different after planting) and end of season crop yield with and without using the
crops (e.g. Jeong et al. (2016); Gonzalez Sanchez et al. (2014)) and lo estimated yield trend from previous years. For each experiment, we
cations (e.g. Jeong et al. (2016)), but their emphasis is again on per compared the regional predictions with a simple method with no pre
formance compared to statistical methods, not on reusable methods. diction skill (the “null” method) and also aggregated the predictions to
Therefore, it is unclear whether their data and methods are transferable national (NUTS0) level and compared them with past MCYFS forecasts.
to other crops and locations. The overall workflow has two parts (Fig. 1). The first part consists of
Large-scale crop yield forecasting systems, such as MCYFS, NASS and preprocessing and feature design, which are specific to data sources, and
Statistics Canada, have historical data, infrastructure, expertise, evalu splitting data into training and test sets. The second part, focusing on
ation frameworks and dissemination channels to build and assess crop machine learning, is independent of data sources. Data from various
yield prediction models for different crops and locations (see Section A of sources, such as crop growth simulation outputs, weather observations
Supplement 1; USDA-NASS (2012); Statistics Canada 2019). To our and yield statistics, were homogenized and aligned to the same spatial
knowledge, these systems do not use machine learning. They build and temporal resolutions. The data was split into training and test sets
statistical models using weather observations, field survey results, crop before designing features (see Section 3.1.2). Some data sources required
growth model outputs, remote sensing indicators and yield statistics. feature design; others were directly used as features. Once we had fea
NASS uses survey results and linear statistical models to forecast crop tures and labels, machine learning algorithms were trained and opti
yields (USDA-NASS, 2012). MCYFS provides a control board for human mized on the training set and evaluated on the test set.
experts to run analyses and to build crop yield prediction models using We designed the workflow emphasizing three principles: correctness,
two methods. The first method estimates the trend related to techno modularity and reusability.
logical improvements and applies a simple or multiple linear regression The overall workflow has two parts. The first part includes pre
on the yield residuals using crop growth model outputs and meteoro processing and feature design. The second part includes machine
logical indicators (MARSWiki, 2020; Lecerf et al. 2019). The second learning.
method applies principal component analysis (Wold et al., 1987) and
cluster analysis to identify similar years and forecast the yield based on 3.1. Workflow design: correctness
similarities (MARSWiki, 2020; Lecerf et al. 2019). In addition, MCYFS
experts use their judgment based on information from other sources, For correctness, we focused on how to design explainable features
such as farming magazines. No previous work has applied machine and how to apply machine learning without information leakage.
learning to MCYFS data. A generic workflow based on MCYFS data
would motivate the use of machine learning in large-scale crop yield 3.1.1. Explainable feature design
forecasting. We incorporated agronomic principles from crop modeling to design
Common applications of statistical models estimate the yield trend features with physical meaning in terms of their impact on crop growth
and detrend yield values before building regression models between and development. Based on the outputs of the WOFOST crop model
predictors and yield residuals (e.g. Lecerf et al. 2019; Bussay et al. (Supit et al., 1994; Van Diepen et al., 1989), we selected 3 dekads (10-
2015). The yield trend for later years includes information from the day periods) when significant changes occur in the crop’s development
earlier years. Evaluating such models by including earlier years in the stage (DVS): (i) START_DVS (DVS ≥ 0) is when the crop emerges from
test set and later years in the training set would cause information the soil, (ii) START_DVS1 (DVS ≥ 100) is the middle of the flowering
leakage. Some applications of machine learning to crop yield prediction phase, and (iii) START_DVS2 (DVS ≥ 200) is when the crop becomes
have also used yield trend or other information from previous year(s). ripe. (See De Wit et al. (2019) for a summary of how DVS is calculated.)
However, not all of them have avoided information leakage. For Using these 3 dekads, we divided the crop season into 6 periods: (i) pre-
instance, Cai et al. (2017) ran cross-validation to train and optimize planting window, (ii) planting window, (iii) vegetative phase, (iv)
their prediction models. During cross-validation, the test fold can be in a flowering phase, (v) yield formation phase, and (vi) harvest window
bin earlier than the training folds, thus leading to information leakage. (Table 1).
To avoid this leakage, Shahhosseini et al. (2019) adopted a time-based For each period of the crop calendar, we identified the weather in
look-forward validation that always put the training data before the dicators, crop growth model outputs and remote sensing indicators that
test data. We designed a machine learning workflow for crop yield affect or capture the state of crop growth and development (Table 2).
prediction emphasizing the application of machine learning without Using these indicators, we designed 3 types of features: (i) maximum
3
D. Paudel et al. Agricultural Systems 187 (2021) 103016
4
D. Paudel et al. Agricultural Systems 187 (2021) 103016
Fig. 2. Training, validation and test splits when using yield trend.
(a) For each region, we split the full dataset into training and test sets.
(b) We further divided the training set into validation training and test sets for feature selection and hyperparameter optimization using a time-based 5-fold
sliding validation.
and training stages (see Section C.2 of Supplement 1) to avoid information We defined configuration options to control data flow when running
leakage during feature selection and training (Muller and Guido 2016). various experiments (Fig. 4). For example, geographical information
The pipelines ensured each stage of training and optimization used only about region centroids was not included by default, but could be used if
the training data. In effect, the parameters for scaling features (e.g. mean desired. Different experiments could be run by updating the configura
and standard deviation), the number of features to select and the feature tion options and running the workflow; the workflow itself did not
weights for the trained model were learned from the training set. change. In addition, the generated features could be saved in a file and
Furthermore, we optimized the hyperparameters using only the training loaded later for machine learning, making the machine learning part of
set. When optimizing the hyperparameters, the pipeline was run for each the workflow independent of preprocessing and feature design. Simi
iteration of 5-fold sliding validation or 5-fold cross-validation. There larly, predictions of machine learning algorithms could be saved to a file
fore, all stages of the pipeline (feature scaling, feature selection and and loaded later for comparison with MCYFS (Section 3.5).
training) were run using the training folds and the trained model was We defined feature selection and prediction algorithms in a modular
evaluated using the corresponding test fold. and extensible manner to enable experimentation with different algo
rithms (Fig. 4). Feature selection algorithms could be added by speci
fying the number of features to select. Similarly, prediction algorithms
3.2. Workflow design: Modularity could be added by setting certain hyperparameters to default values and
specifying the values of other hyperparameters to be optimized. We
For modularity, we focused on making the baseline relatively easy to defined the range of values of hyperparameters as lists that could be
improve and extend. We minimized the dependencies between succes extended or shortened.
sive stages of the workflow. We chose extensible data structures to allow
the indicators selected for feature design to change without affecting the
workflow (Fig. 3). The goal was to simplify the process of designing new 3.3. Workflow design: Reusability
features or improving existing features with new data. For example,
features for extreme conditions count days or dekads with values +/− 1 We designed the workflow to be reusable for different crops and
standard deviation and +/− 2 standard deviations from the average. countries. We applied data homogenization to standardize the fil
The use of the averages and standard deviations of indicators makes the enames, file formats and data columns, thereby minimizing the amount
workflow generic and reusable. However, when crop-specific thresholds of input required to run the workflow. We reused the same feature
for different indicators are available, such data can be used to manually design principles for different case studies (see Section 3.1.1). Data
define more accurate and predictive features (see Section C.1.3 of Sup homogenization and configuration options for crop name, country (two
plement 1 for examples). letter code, e.g. NL) and NUTS level made it possible to run the workflow
5
D. Paudel et al. Agricultural Systems 187 (2021) 103016
for different crops, countries and NUTS levels (Fig. 4). We set most France (FR) to evaluate the workflow. We had NL data for 12 NUTS2
configuration options to reasonable defaults to avoid specifying all of regions from 1994 to 2018, FR data for 101 NUTS3 regions from 1989 to
them for every experiment. 2018 and DE data for 401 NUTS3 regions from 1999 to 2018. As
described in Section 3.1.2, we used 70% of the data for training and 30%
for testing. Section E of Supplement 1 provides more details about the data
3.4. Data, case studies and experiments and the NUTS regions. We did not use region centroids by default
because it was unclear whether they provided additional information
We used WOFOST crop growth model outputs, weather observations, not included in WOFOST outputs and weather observations.
remote sensing data, soil data, region centroids, modeled crop area We used thirteen case studies and ran four experiments for each case
fractions and yield statistics for the Netherlands (NL), Germany (DE) and
6
D. Paudel et al. Agricultural Systems 187 (2021) 103016
study to verify correctness, modularity and reusability of the machine fractions of other sibling regions (with the same parent NUTS region).
learning workflow. First, to verify the explainability of features, we
counted the frequencies of selected features for each crop across 3.6. Implementation
different countries and algorithms. We deferred a detailed analysis of
feature importance for future research. Second, to verify modularity of We used Apache Spark dataframes (Zaharia et al. 2016) for data
the workflow, we ran four experiments for each crop and country with preprocessing and feature design, and applied machine learning using
options for using yield trend (Yes or No) and early season prediction (Yes the scikit-learn python package (Pedregosa et al. 2011). We developed
or No). For early season prediction, we used current season information and tested the workflow in Google Colaboratory (https://colab.research
up to 30 days after planting. For end of season prediction, we used .google.com/) and ran the different experiments in Google Dataproc
current season information up to the end of the harvest window. Third, cluster (https://cloud.google.com/dataproc) and Microsoft Azure
to verify reusability, we ran the four experiments for thirteen case Databricks (https://azure.microsoft.com/en-us/services/databricks/).
studies: soft wheat (NL, DE, FR), spring barley (NL, DE, FR), sunflower
(FR), sugar beet (NL, DE, FR) and potatoes (NL, DE, FR). We tested the 4. Results
optional components of the workflow (e.g. using centroids, saving and
loading features) on soft wheat (NL). For NL, predictions were made at To verify explainability of features, we looked at feature selection
NUTS2; for DE and FR, predictions were made at NUTS3. Overall, we frequencies for each crop across different countries and algorithms. To
tested the workflow with two NUTS levels, five crops and three demonstrate modularity and reusability, we ran four experiments with
countries. options to use yield trend (Yes or No) and to predict early in the season
We evaluated the performance of four machine learning algorithms (Yes or No) for all thirteen crop and country combinations: soft wheat
in predicting crop yield: (i) Ridge Regression (Hoerl and Kennard 1970), (NL, DE, FR), spring barley (NL, DE, FR), sunflower (FR), sugar beet (NL,
(ii) K-nearest Neighbors Regression (Cover and Hart 1967; Aha et al. DE, FR) and potatoes (NL, DE, FR). Predictions for NL were made at
1991), (iii) Support Vector Machines Regression (Boser et al. 1992; NUTS2 and predictions for DE and FR were made at NUTS3. All results
Cortes and Vapnik 1995), and (iv) Gradient Boosted Decision Trees were aggregated to national level and compared with past MCYFS
Regression (see Friedman 2001; Hastie et al. 2009). These methods forecasts. In this section, we present the normalized RMSE for different
represent different classes of algorithms based on how they learn the case studies. MAPE results are included in Section D of Supplement 1, and
relationships between features and labels. Section C.2.3 of Supplement 1 all results including normalized MAE, normalized RMSE, MAPE and R2
provides a brief description of these algorithms. The predictions of for all case studies, experiments and algorithms are provided in Sup
machine learning algorithms were compared with those of a simple plement 2. Results of optional experiments (e.g. using region centroids
method with no skill (the “null” method). When yield trend was not data, saved features and saved predictions) are also included in Sup
used, the null method was equivalent to the ZeroR algorithm (see Baskin plement 2.
et al. 2017), which predicts the average of the training set. When yield
trend was used, the null method predicted the linear yield trend esti 4.1. Feature selection frequencies
mated from a 5-year window. All algorithms were evaluated using mean
absolute error (MAE), mean absolute percentage error (MAPE), root Feature selection counts for potatoes show that soil water holding
mean squared error (RMSE) and the coefficient of determination or R2. capacity was always selected (Table 3). Similarly, all the features for the
MAE and RMSE were compared using their normalized counterparts. pre-planting window were frequently selected. For the planting window,
The normalized errors were calculated by dividing the mean error with averages and extremes of temperature and precipitation were important.
the mean yield of the test set. Section C.2.3 of Supplement 1 provides the Similarly, most frequently selected features for the vegetative phase
details about the evaluation metrics used. were the fraction of absorbed photosynthetically active radiation
(FAPAR), water-limited yield biomass, leaf area index and average
3.5. Comparison with MCYFS forecasts temperature. Precipitation and maximum temperature extremes were
important for the flowering phase. For the yield formation phase, FAPAR
We aggregated the predictions of the machine learning baseline from and WOFOST indicators such as total water consumption, water-limited
NUTS2 (NL) or NUTS3 (DE, FR) to national (NUTS0) level to compare yield biomass and yield storage were important. Finally, average and
with past MCYFS forecasts. NUTS2 or NUTS3 predictions were aggre extremes of precipitation were important during the harvest window.
gated to NUTS0 by weighting them on the modeled crop area. Cerrani Feature selection frequencies are generally consistent with the factors
and Lopez Lozano (2017) have described in detail the algorithm used to affecting crop growth and development during these periods. For
model crop areas for different NUTS levels. Predictions at NUTS3 were example, temperature extremes during the flowering phase and pre
aggregated to NUTS2 based on crop area weights for NUTS3 regions, and cipitation extremes during planting and harvest windows (see Van der
predictions at NUTS2 were further aggregated to NUTS1 using crop area Velde et al. 2018) are known to influence crop yield. Feature selection
weights for NUTS2 regions, and so on. We compared the aggregated frequencies for other crops are included in Supplement 2.
NUTS0 predictions and the actual MCYFS forecasts (see Van der Velde
and Nisini (2019)) using the official Eurostat national yield statistics 4.2. Yield trend vs. no yield trend
(Eurostat, 2020a) as the reference. We evaluated the two sets of pre
dictions using mean absolute error (MAE), mean absolute percentage We compared the end of season predictions of the Gradient Boosted
error (MAPE), root mean squared error (RMSE) and the coefficient of Decision Trees (GBDT) algorithm with the option of using yield trend
determination or R2. (Yes or No) to those of the null method (Fig. 5; Fig. 13). We chose GBDT
We had to make an adjustment to training and test splits to aggregate because its performance was better than other algorithms in most cases.
the crop yield predictions from NUTS3 or NUTS2 to NUTS0: the test set Except for a few instances (e.g. normalized RMSE for sugar beet (NL) and
had to include the same set of years for all regions. (Note this restriction sugar beet (DE) “No Yield Trend” (Fig. 5); MAPE for potatoes (FR) “Yield
is necessary only when aggregating the predictions to NUTS0 level.) Trend” (Fig. 13)), machine learning performed better than the null
When we made the test years the same, some regions and test years were method. Because of the differences in training and test sets (see Section
missing predictions. We filled the missing predictions in two ways. First, 3.1.2), we cannot directly compare “Yield Trend” and “No Yield Trend”.
if the region had predictions for other test years, we filled the missing Nevertheless, the two sets of error values were quite similar, indicating
value with the average of the remaining years. Second, if the region had that machine learning could be applied with or without yield trend.
no predictions at all, we ignored the region and adjusted the area When using the yield trend, the test set included the tail end of available
7
D. Paudel et al. Agricultural Systems 187 (2021) 103016
Table 3 crops) and DE (spring barley, sugar beet, potatoes) and FR (soft wheat,
Feature selection frequencies for potatoes (No Yield Trend). spring barley, sunflower). For example, the Normalized RMSE was 7.87
Static Features (Frequency) for soft wheat (NL) (6.32 for MCYFS), 8.21 for sugar beet (DE) (8.79 for
MCYFS) and 10.63 for sunflower (FR) (10.91 for MCYFS). On the other
Soil water holding capacity (12)
hand, predictions for DE (soft wheat) and FR (sugar beet and potatoes)
Period Features (Frequency) were much worse; the Normalized RMSE was 16.38 for soft wheat (DE)
Pre-planting avg TAVG (9), avg CWB (8), avg PREC (8) (6.21 MCYFS), and 14.34 sugar beet (FR) (MCYFS 7.42). As the season
window progressed, MCYFS forecasts improved significantly while machine
Planting window avg TAVG (4), avg PREC (6), TMIN >1 STD (5), PREC >1 STD
learning predictions did not improve as much (Fig. 7a,b; Fig. 15a,b).
(4), TMIN <2 STD (3), TMIN <1 STD (3), RSM < 2 STD (1),
TMIN >2 STD (1) Predictions for NL were still comparable to MCYFS (e.g. Normalized
Vegetative phase max WLIM_YB (11), max TWC (7), max WLAI (7), avg RSM (4) RMSE was 3.05 for soft wheat (NL) (MCYFS 5.48)), but worse for DE and
avg FAPAR (12), avg TAVG (11), avg CWB (9), RSM > 2 STD FR. The baseline used the same data sources throughout the season:
(3) WOFOST outputs, weather observations, remote sensing indicators and
Flowering phase avg PREC (8), TMAX >1 STD (4), TMAX <1 STD (4), RSM < 1
STD (3), PREC >1STD (3), PREC >2 STD (3), TMAX >2 STD
soil data. On the other hand, MCYFS uses other sources of information,
(1), TMAX <2 STD (1) such as media reports and farming magazines, to update their pre
Yield formation avg FAPAR (12), max WLIM_YB (11), max WLIM_YS (8), max dictions. Moreover, the role of MCYFS analysts is key as they investigate
phase TWC (8), max WLAI (6), avg RSM (8), avg CWB (7), RSM > 2 the underlying feature data, identifying the ones that better explain crop
STD (4), RSM < 1 STD (4)
growth and yields, and select the appropriate statistical models to pro
Harvest window PREC >2 STD (4), avg PREC (3)
duce reliable yield forecasts (Lopez-Lozano and Baruth, 2019).
Selection frequencies were aggregated for three countries (NL, DE, FR) and four
algorithms. Weather indicators included average temperature (TAVG), precipi
5. Discussion
tation (PREC), climate water balance (CWB = precipitation - evapotranspira
tion), minimum temperature (TMIN) and maximum temperature (TMAX).
WOFOST outputs included water-limited yield biomass (WLIM_YB), water-
Previous studies (e.g. Shahhosseini et al. 2019; Cai et al. 2019; You
limited yield storage (WLIM_YS), water-limited leaf area index (WLAI), rela et al. 2017; Jeong et al. 2016) have demonstrated that machine learning
tive soil moisture (RSM) and total water consumption (TWC). Remote sensing can play an important role in crop yield prediction and the same was
indicators included the fraction of absorbed photosynthetically active radiation confirmed by our results. Likewise, machine learning has the potential to
(FAPAR). Other abbreviations: avg= average, max = maximum, min = mini build on other methods of yield prediction, such as field surveys, crop
mum, STD = standard deviation. growth models and remote sensing. Prior applications of machine
learning to crop yield prediction focused on optimizing performance for
years. Therefore, using the yield trend would be useful to make pre specific case studies. We focused on a generic workflow that could be
dictions for the future. The “No Yield Trend” approach could be useful to used to investigate the potential of machine learning across different
make predictions for missing years. crops and locations. The machine learning baseline covers the meth
odological aspects of applying machine learning and acts as a baseline in
terms of performance. Future applications of machine learning could
4.3. Early season vs. end of season predictions
investigate in more detail the advantages of combining machine
learning with other methods, such as crop growth models and remote
Early season predictions using yield trend (Fig. 6; Fig. 14) indicated
sensing, and compare their results with the baseline.
that the baseline could make early season predictions better than the
We designed the machine learning baseline emphasizing three
null method. We selected GBDT for comparison because its performance
principles: correctness, modularity and reusability. First, we focused on
was better than other algorithms in most cases. The normalized RMSE
correctness to design explainable features and to apply machine learning
and MAPE values for machine learning were lower than those for the
without information leakage. When working with time series data, such
null method in all instances except MAPE for potatoes (FR) (Fig. 14). The
as crop yield, features designed using values from previous years, such
null method predicted the yield using a linear 5-year trend. Early season
as the yield trend, are used. Whenever information from previous years
predictions were made 30 days (or 3 dekads) after planting. End of
is included in features, particular attention is required to avoid infor
season predictions were made at the end of the harvest window. Both
mation leakage. The baseline presents a time-based training and test
early season and end season predictions used the yield values of 5 pre
split and a k-fold sliding validation to ensure that information from the
vious years, soil data and the current season information up to the
test set is not used during training. Second, we emphasized modularity to
prediction dekad. Except for Spring Barley (NL), error values for the
let the workflow evolve and to run experiments with alternative con
machine learning baseline improved slightly over the course of the
figurations. The workflow supports incremental changes to extend and
season.
optimize the baseline for specific case studies. Third, we focused on
reusability to enable the same workflow to run for different crops and
4.4. Comparison with MCYFS forecasts locations. The emphasis on modularity and reusability will encourage
model and software reuse and prevent a proliferation of monolithic and
We aggregated the predictions of the machine learning baseline to duplicate software implementations (Janssen et al. 2017; Holzworth
NUTS0 and compared them with past MCYFS forecasts using Eurostat et al., 2014).
national yield statistics as the reference. Because the MCYFS method A key innovation of the baseline is the feature design method fol
performs trend analysis, we compared the predictions of machine lowed by feature selection later in the workflow. We designed features
learning algorithms using the yield trend. For comparison, we used based on agronomic principles from crop modeling. We identified in
predictions from the best machine learning algorithm and the selected dicators that affect crops during different crop calendar periods. We also
algorithm varied by case study. The details are included in Supplement 2. included features to account for extreme conditions. Features for
For early season, we compared the predictions of machine learning for extreme conditions were based on averages and standard deviations of
30 days after planting with MCYFS forecasts from the closest dekad indicators, making the workflow generic and reusable. By creating a
(Fig. 7a; Fig. 15a). We also compared machine learning predictions at the large number of features, we explored the space of thresholds for
end of the harvest window with the final MCYFS prediction of the year extreme conditions and leveraged feature selection to identify the
(Fig. 7b; Fig. 15b). The machine learning baseline performed similar to appropriate thresholds. Similarly, instead of having experts hand pick
MCYFS early in the season. Predictions were comparable for NL (all four features, we generated a large number of features and applied feature
8
D. Paudel et al. Agricultural Systems 187 (2021) 103016
selection to identify the most predictive ones. In this respect, we take a the full dataset. We decided to predict crop yield at the sub-national
data-driven approach to learn the features that explain yield variability level and combined data from different regions to ensure a sizable
for each crop and country. dataset. MCYFS forecasts are made at the national level and rely on crop
We ran the baseline to predict crop yield by applying supervised yield statistics reported by European Union countries to Eurostat
machine learning, which relies heavily on the size and quality of the following the guidelines set out in the Annual Crop Statistics Handbook
data. In particular, a supervised learning algorithm is a good predictor (Eurostat, 2019). Yield statistics at sub-national levels are not curated as
when training labels are reliable and the training set is representative of often and vary across countries and crops (Lopez-Lozano et al., 2015).
9
D. Paudel et al. Agricultural Systems 187 (2021) 103016
Some regions have missing data and others have data copied from pre spring barley, sunflower), the baseline’s performance was comparable to
vious years. Thus, regional crop yield prediction illustrates the data size MCYFS (see Fig. 7a; Fig. 15a). In terms of methodology, MCYFS uses data
vs. data quality trade-off (e.g. see MAPE for potatoes (FR), Fig. 14). from all previous years to train models for the upcoming year (see Van
Nevertheless, the aggregated NUTS0 predictions of machine learning der Velde and Nisini (2019)). In contrast, the machine learning baseline
were promising, especially early in the season. In the case of NL (all four was trained with data up to 2011 or 2012, with predictions extrapo
crops) and DE (spring barley, sugar beet, potatoes) and FR (soft wheat, lating up to 2018. Such differences in data and methods should be
10
D. Paudel et al. Agricultural Systems 187 (2021) 103016
considered when comparing the performance between the baseline and regional analysis. The machine learning baseline would serve as a
the MCYFS forecasts. Future research could investigate methods to starting point for such research.
address data quality and analyze the impact of different features, algo As the present implementation of the baseline is based on MCYFS
rithms, hyperparameters and regularization methods to shed light into data, it can be directly used for crops and countries covered by MCYFS.
the potential of machine learning to improve crop yield predictions. Similarly, the baseline can be extended to scenarios where equivalent
Crop yield prediction at sub-national level may be a better approach for crop development and crop yield indicators (e.g. dry-weight yield
certain crops and countries where regional data is reliable. On one hand, biomass, leaf area, development stage) are available from other crop
the aggregated national yield forecasts could be more accurate and, on simulation models. Furthermore, Lopez-Lozano and Baruth (2019) have
the other, the sub-national yield forecasts could also be useful for proposed a framework to extend MCYFS-style data and infrastructure to
11
D. Paudel et al. Agricultural Systems 187 (2021) 103016
the rest of the world. The machine learning baseline would be useful Declaration of Competing Interest
when data for the rest of the world is available in a similar format to
MCYFS. The authors declare that they have no known competing financial
The baseline has ample room for improvement both in terms of the interests or personal relationships that could have appeared to influence
general design principles as well as fit-for-purpose optimizations. From the work reported in this paper.
our experience, the baseline could be improved in at least five ways.
First, detection of outliers and duplicate data (particularly for yield Acknowledgements
statistics) could help improve the quality of training data. Second, the
impact of different features, algorithms, hyperparameters and regula This work was partially supported from the European Union’s Ho
rization methods could be analyzed to build a better optimized machine rizon 2020 research and innovation programme under grant agreement
learning model. Third, new data sources could be added by applying No. 825355 (CYBELE).
appropriate data homogenization and preprocessing. Another consid We would like to thank S. Niemeyer from the European Commis
eration is feature design. Some data sources can be directly used as sion’s Joint Research Centre (JRC) for the permission to use MCYFS data
features; others require careful feature design. Fourth, certain additional and to provide open access to MCYFS data for the Netherlands. Simi
data could make feature design more accurate. In the baseline, we infer larly, we would like to thank M. van der Velde, L. Nisini and I. Cerrani
the crop calendar for the whole country using WOFOST outputs. Crop from JRC for sharing with us past MCYFS forecasts and Eurostat national
calendar could be made per region, especially when the country covers yield statistics. We acknowledge D. Tuia from Geo-Information and
multiple agro-ecological zones. More accurate sowing and harvest dates, Remote Sensing Group of Wageningen University and Research for in
phenological databases or remote sensing (see Alemu and Henebry sights on application of machine learning to crop yield prediction. We
2016) could be used to define the crop calendar. Similarly, crop-specific would like to thank M. van der Velde from JRC, P. Griffiths from
thresholds could be used to define extreme conditions. Fifth, more Wageningen Into Languages and R. Fletcher from Wageningen School of
advanced features could be designed to include weather or soil infor Social Sciences for feedback on the manuscript text. We are thankful to
mation from the previous years and to capture changes in cropping Yiqing Cai from Gro Intelligence for the clarification on their method of
patterns. crop yield prediction.
The machine learning baseline has some technical limitations as
well. First, the baseline does not have a generic method for data pre Appendix A. Supplementary data
processing. Data for certain crops and countries may need extensive
preprocessing to fit the requirements of the baseline. Second, the base Supplementary data to this article can be found online at https://doi.
line is not implemented for very big data analyses. Although we used org/10.1016/j.agsy.2020.103016.
Spark data frames for distributed preprocessing and feature design, we
employed scikit-learn for feature selection and machine learning. Scikit- References
learn does not distribute data and computations when running multiple
algorithms or when optimizing hyperparameters. The main reason for EC-JRC, 2020. JRC Agri4Cast Data Portal. https://agri4cast.jrc.ec.europa.eu/DataPorta
l/Index.aspx (Last accessed: May 11, 2020).
using scikit-learn instead of Spark machine learning library (Spark Eurostat, 2020a. Eurostat - agricultural production - crops. https://ec.europa.eu/eurostat
MLlib, https://spark.apache.org/mllib/) was feature selection. In the /statistics-explained/index.php/Agricultural_production_-_crops (Last accessed: May
future, Spark MLlib may evolve to support the required functionality. In 11, 2020).
Eurostat, 2020b. Eurostat - geographical information and maps. https://ec.europa.eu/eu
any case, future research could focus on running the machine learning rostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts
part of the workflow in a distributed environment. (Last accessed: May 11, 2020).
Lopez-Lozano, R., Baruth, B., 2019. Anevaluationframeworktobuilda cost-efficient crop
monitoring system. experiences from the extension of the European crop monitoring
6. Conclusions system. Agricultural Systems 168, 231–246. https://doi.org/10.1016/j.
agsy.2018.04.002.
We designed a modular and reusable machine learning workflow for Aha, D.W., Kibler, D., Albert, M.K., 1991. Instance-based learning algorithms. Mach.
Learn. 6, 37–66. https://doi.org/10.1007/BF00153759.
crop yield prediction and tested the workflow on thirteen case studies.
Alemu, W.G., Henebry, G.M., 2016. Characterizing cropland phenology in major grain
Overall, we found that explainable features designed using principles of production areas of Russia, Ukraine, and Kazakhstan by the synergistic use of passive
crop modeling can be used to predict crop yield at sub-national level. For microwave and visible to near infrared data. Remote Sens. 8, 1016. https://doi.org/
early season predictions, the machine learning baseline performed 10.3390/rs8121016.
Baskin, I.I., Marcou, G., Horvath, D., Varnek, A., 2017. Benchmarking machine- learning
similar to MCYFS in most cases. There was room for improvement as the methods. In: Tutorials in Chemoinformatics, pp. 209–222. https://doi.org/10.1002/
season progressed. For crops and countries where regional data is reli 9781119161110.ch13.
able, sub-national yield prediction using machine learning is a prom Basso, B., Liu, L., 2019. Seasonal crop yield forecast: methods, applications, and
accuracies. In: Advances in Agronomy, 154. Elsevier, pp. 201–255. https://doi.org/
ising approach going forward. Apart from addressing data quality issues, 10.1016/bs.agron.2018.11.002.
the baseline could be improved in three main ways: adding new data Basso, B., Cammarano, D., Carfagna, E., 2013. Review of crop yield forecasting methods
sources, designing more predictive features and evaluating different and early warning systems. In: Report Presented to the First Meeting of the Scientific
Advisory Committee of the Global Strategy to Improve Agricultural and Rural
algorithms. The machine learning baseline serves as a starting point to Statistics, FAO Headquarters, Rome, Italy, 18–19 July.
explore the potential of machine learning for large-scale crop yield Boser, B.E., Guyon, I.M., Vapnik, V.N., 1992. A training algorithm for optimal margin
forecasting. classifiers, in: proceedings of the fifth annual workshop on computational learning
theory, ACM New York, NY, USA. pp. 144–152.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Data and software availability Bussay, A., van der Velde, M., Fumagalli, D., Seguini, L., 2015. Improving operational
maize yield forecasting in Hungary. Agric. Syst. 141, 94–106. https://doi.org/
10.1016/j.agsy.2015.10.001.
Sample data for the Netherlands are available at DOI: https://doi.
Cai, Y., Moore, K., Pellegrini, A., Elhaddad, A., Lessel, J., Townsend, C., Solak, H.,
org/10.5281/zenodo.4312941 courtesy of the European Commission’s Semret, N., 2017. Crop yield predictions-high resolution statistical model for intra-
Joint Research Centre (JRC). season forecasts applied to corn in the US. In: 2017 Fall Meeting. Gro Intelligence
The software implementation is available at: https://github.co Inc.
Cai, Y., Guan, K., Lobell, D., Potgieter, A.B., Wang, S., Peng, J., Xu, T., Asseng, S.,
m/BigDataWUR/MLforCropYieldForecasting. Zhang, Y., You, L., et al., 2019. Integrating satellite and climate data to predict wheat
yield in Australia using machine learning approaches. Agric. For. Meteorol. 274,
144–159.
12
D. Paudel et al. Agricultural Systems 187 (2021) 103016
Cerrani, I., Lopez Lozano, R., 2017. Algorithm for the disaggregation of crop area Kohonen, T., 2001. Self-Organizing Maps. Springer.
statistics in the MARS crop yield forecasting system. https://agri4cast.jrc.ec.europa. Lecerf, R., Ceglar, A., Lopez-Lozano, R., Van Der Velde, M., Baruth, B., 2019. Assessing
eu/DataPortal/Resource_Files/PDF_Documents/31_rationale.pdf (Last accessed: Oct the information in crop model and meteorological indicators to forecast crop yield
8, 2020). over Europe. Agricultural Systems 168, 191–202.
Chipanshi, A., Zhang, Y., Kouadio, L., Newlands, N., Davidson, A., Hill, H., Warren, R., Liakos, K., Busato, P., Moshou, D., Pearson, S., Bochtis, D., 2018. Machine learning in
Qian, B., Daneshfar, B., Bedard, F., et al., 2015. Evaluation of the integrated agriculture: a review. Sensors 18, 2674. https://doi.org/10.3390/s18082674.
Canadian crop yield forecaster (ICCYF) model for in-season prediction of crop yield Lobell, D.B., Thau, D., Seifert, C., Engle, E., Little, B., 2015. A scalable satellite-based
across the Canadian agricultural landscape. Agri- cultural and Forest Meteorology crop yield mapper. Remote Sens. Environ. 164, 324–333. https://doi.org/10.1016/j.
206, 137–150. https://doi.org/10.1016/j.agrformet. 2015.03.007. rse.2015.04.021.
Chlingaryan, A., Sukkarieh, S., Whelan, B., 2018. Machine learning approaches for crop Lokers, R., Knapen, R., Janssen, S., van Randen, Y., Jansen, J., 2016. Analysis of big data
yield prediction and nitrogen status estimation in precision agriculture: a review. technologies for use in agro-environmental science. Environ. Model Softw. 84,
Comput. Electron. Agric. 151, 61–69. https://doi.org/10.1016/j. 494–504. https://doi.org/10.1016/j.envsoft.2016.07.017.
compag.2018.05.012. Lopez-Lozano, R., Duveiller, G., Seguini, L., Meroni, M., Garcıa-Condado, S., Hooker, J.,
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20, 273–297. Leo, O., Baruth, B., 2015. Towards regional grain yield forecasting with 1 km-
https://doi.org/10.1007/BF00994018. resolution EO biophysical products: strengths and limitations at pan-European level.
Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory In: Agricultural and Forest Meteorology, 206, pp. 12–32. https://doi.org/10.3390/
13, 21–27. https://doi.org/10.1109/TIT.1967.1053964. s18082674.
Crane-Droesch, A., 2018. Machine learning methods for crop yield prediction and Mackowiak, S.D., Zauber, H., Bielow, C., Thiel, D., Kutz, K., Calviello, L.,
climate change impact assessment in agriculture. Environ. Res. Lett. 13, 114003. Mastrobuoni, G., Rajewsky, N., Kempa, S., Selbach, M., et al., 2015. Extensive
https://doi.org/10.1088/1748-9326/aae159. identification and analysis of conserved small orfs in animals. Genome Biol. 16, 179.
De Wit, A., Boogaard, H., Fumagalli, D., Janssen, S., Knapen, R., van Kraalingen, D., https://doi.org/10.1186/s13059-015-0742-x.
Supit, I., van der Wijngaart, R., van Diepen, K., 2019. 25 years of the WOFOST Muller, A.C., Guido, S., 2016. Introduction to Machine Learning with Python: A Guide for
cropping systems model. Agric. Syst. 168, 154–167. https://doi.org/10.1016/j. Data Scientists. O’Reilly Media, Inc.
agsy.2018.06.018. Newlands, N.K., Zamar, D.S., Kouadio, L.A., Zhang, Y., Chipanshi, A., Toure, S., Hill, H.
Dorigo, W.A., Zurita-Milla, R., de Wit, A.J., Brazile, J., Singh, R., Schaepman, M.E., 2007. S., 2014. An integrated, probabilistic model for improved seasonal forecasting of
A review on reflective remote sensing and data assimilation techniques for enhanced agricultural crop yield under environmental uncertainty. Frontiers in Environmental
agroecosystem modeling. Int. J. Appl. Earth Obs. Geoinf. 9, 165–193. https://doi. Science 2, 17. https://doi.org/10.3389/fenvs. 2014.00017.
org/10.1016/j.jag.2006.05.003. Pantazi, X.E., Moshou, D., Alexandridis, T., Whetton, R.L., Mouazen, A.M., 2016. Wheat
Fischer, R., 2015. Definitions and determination of crop yield, yield gaps, and of rates of yield prediction using machine learning and advanced sensing techniques. Comput.
change. Field Crop Res. 182, 9–18. https://doi.org/10.1016/j.fcr.2014. 12.006. Electron. Agric. 121, 57–65. https://doi.org/10.1016/j.compag.2015.11.018.
Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Annals of Statistics 1189–1232. https://www.jstor.org/stable/2699986 (Last Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
accessed: May 11, 2020). Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., 2011. Scikit- learn: machine
GODAN, 2020. Global open data for agriculture and nutrition. www.godan.info (Last learning in Python. J. Mach. Learn. Res. 12, 2825–2830.
accessed: June 2, 2020). Phalan, B., Green, R., Balmford, A., 2014. Closing yield gaps: perils and possibilities for
Gonzalez Sanchez, A., Frausto Solıs, J., Ojeda Bustamante, W., et al., 2014. Predictive biodiversity conservation. Philosophical Transactions of the Royal Society B:
ability of machine learning methods for massive crop yield prediction. Span. J. Biological Sciences 369, 20120285. https://doi.org/10.1098/rstb.2012. 0285.
Agric. Res. 12. Schnepf, R., 2017. NASS and US Crop Production Forecasts: Methods and Issues.
Goodfellow, I., Bengio, Y., Courville, A., 2016. Deep Learning. MIT Press. http://www. Technical Report. Congressional Research Service (Last accessed: May 11, 2020).
deeplearningbook.org (Last accessed: May 11, 2020). Shahhosseini, M., Martinez-Feria, R.A., Hu, G., Archontoulis, S.V., 2019. Maize yield and
Han, J., Zhang, Z., Cao, J., Luo, Y., Zhang, L., Li, Z., Zhang, J., 2020. Prediction of winter nitrate loss prediction with machine learning algorithms. Environ- mental Research
wheat yield based on multi-source data and machine learning in China. Remote Sens. Letters 14, 124026. https://doi.org/10.1088/1748-9326/ab5268.
12, 236. https://doi.org/10.3390/rs12020236. Socher, R., Huval, B., Manning, C.D., Ng, A.Y., 2012. Semantic compositionality through
Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learn- Ing: Data recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on
Mining, Inference, and Prediction. Springer Science & Business Media. Empirical Methods in Natural Language Processing and Computational Natural
Hoerl, A.E., Kennard, R.W., 1970. Ridge regression: biased estimation for nonorthogonal Language Learning, Association for Computational Linguistics, pp. 1201–1211.
problems. Technometrics 12, 55–67. https://doi.org/10.1080/00401706. https://doi.org/10.5555/2390948.2391084.
1970.10488634. Statistics Canada, 2019. An Integrated Crop Yield Model Using Remote Sensing,
Holzworth, D.P., Huth, N.I., deVoil, P.G., Zurcher, E.J., Herrmann, N.I., McLean, G., Agroclimatic Data and Crop Insurance Data. https://www.statcan.gc.ca/eng/statis
Chenu, K., van Oosterom, E.J., Snow, V., Murphy, C., et al., 2014. APSIM–evolution tical-programs/document/3401_D2_V1 (Last accessed: Oct 8, 2020).
towards a new generation of agricultural systems simulation. Environ. Model Softw. Supit, I., Hooijer, A., Van Diepen, C., 1994. System Description of the WOFOST 6.0 Crop
62, 327–350. https://doi.org/10.1016/j.envsoft.2014.07.009. Simulation Model Implemented in CGMS. Vol. 1. Theory and Algorithms., in: EUR
Holzworth, D.P., Snow, V., Janssen, S., Athanasiadis, I.N., Donatelli, M., Publication No. 15959 EN. Office for Official Publications of the European
Hoogenboom, G., White, J.W., Thorburn, P., 2015. Agricultural produc- tion systems Communties, Luxembourg, p. 146.
modelling and software: current status and future prospects. Environ. Model Softw. Tilman, D., Balzer, C., Hill, J., Befort, B.L., 2011. Global food demand and the sustainable
72, 276–286. https://doi.org/10.1016/j.envsoft. 2014.12.013. intensification of agriculture. In: Proceedings of the National Academy of Sciences,
Eurostat, 2019. Annual Crop Statistics Handbook. https://ec.europa.eu/eurostat/cach National Academy of Sciences of the US, pp. 20260–20264. https://doi.org/
e/metadata/Annexes/apro_cp_esms_an1.pdf (Last accessed: May 11, 2020). 10.1073/pnas.1116437108.
Eurostat, 2016. Nomenclature of territorial units for statistics. https://ec.europa.eu/eu Van der Velde, M., Nisini, L., 2019. Performance of the MARS-crop yield forecasting
rostat/web/nuts/background (Last accessed: eMay 11, 2020). system for the European Union: assessing accuracy, in-season, and year-to-year
MARSWiki, 2020. MARS Crop Yield Forecasting System. https://marswiki.jrc.ec.europa. improvements from 1993 to 2015. Agric. Syst. 168, 203–212. https://doi.org/
eu/agri4castwiki/index.php/Welcome_to_WikiMCYFS (Last accessed: May 11, 10.1016/j.agsy.2018.06.009.
2020). Van der Velde, M., Baruth, B., Bussay, A., Ceglar, A., Condado, S.G., Lecerf, R., Lopez, R.,
USDA-NASS, 2012. The Yield Forecasting Program of NASS. Technical Report. United Maiorano, A., Nisini, L., et al., 2018. In-season performance of European Union
States Department of Agriculture (USDA). https://www.nass.usda.gov/Education_a wheat forecasts during extreme impacts. Sci- entific Reports 8, 1–10. https://doi.
nd_Outreach/Understanding_Statistics/Yield_Forecasting_Program.pdf (Last org/10.1038/s41598-018-33688-1.
accessed: May 11, 2020). Van Diepen, C., Wolf, J., Van Keulen, H., Rappoldt, C., 1989. WOFOST: a simulation
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An introduction to statistical model of crop production. Soil Use Manag. 5, 16–24. https://doi.org/10.1111/
learning, 112. Springer. j.1475-2743.1989.tb00755.x.
Janssen, S.J., Porter, C.H., Moore, A.D., Athanasiadis, I.N., Foster, I., Jones, J.W., Von der Malsburg, C., 1973. Self-organization of orientation sensitive cells in the striate
Antle, J.M., 2017. Towards a new generation of agricultural system data, models and cortex. Kybernetik 14, 85–100. https://doi.org/10.1007/BF00288907.
knowledge products: information and communication technology. Agric. Syst. 155, Willcock, S., Hooftman, D.A., Bagstad, K.J., Balbi, S., Marzo, A., Prato, C., Sciandrello, S.,
200–212. https://doi.org/10.1016/j.agsy.2016.09. 017. Signorello, G., Voigt, B., et al., 2018. Machine learning for ecosystem services.
Jeong, J.H., Resop, J.P., Mueller, N.D., Fleisher, D.H., Yun, K., Butler, E.E., Timlin, D.J., Ecosystem services 33, 165–174. https://doi.org/10.1016/j.ecoser.2018.04.004.
Shim, K.M., Gerber, J.S., Reddy, V.R., et al., 2016. Random forests for global and Wold, S., Esbensen, K., Geladi, P., 1987. Principal component analysis. Chemom. Intell.
regional crop yield predictions. PLoS One 11, e0156571. https://doi.org/10.1371/ Lab. Syst. 2, 37–52. https://doi.org/10.1016/0169-7439(87)80084-9.
journal.pone.0156571. You, J., Li, X., Low, M., Lobell, D., Ermon, S., 2017. Deep gaussian process for crop yield
Jones, H.G., Vaughan, R.A., 2010. Remote Sensing of Vegetation: Principles, Techniques, prediction based on remote sensing data. In: Thirty-First AAAI Conference on
and Applications. Oxford University Press. Artificial Intelligence (Last accessed: May 11, 2020).
Kamilaris, A., Prenafeta-Boldu, F.X., 2018. Deep learning in agriculture: a survey. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J.,
Comput. Electron. Agric. 147, 70–90. https://doi.org/10.1016/j. Venkataraman, S., Franklin, M.J., et al., 2016. Apache spark: a unified engine for big
compag.2018.02.016. data processing. Commun. ACM 59, 56–65. https://doi.org/10.1145/2934664.
Kang, J., Schwartz, R., Flickinger, J., Beriwal, S., 2015. Machine learning approaches for Zhao, Y., Potgieter, A.B., Zhang, M., Wu, B., Hammer, G.L., 2020. Predicting wheat yield
predicting radiation therapy outcomes: a clinician’s perspective. International at the field scale by combining high-resolution sentinel-2 satellite imagery and crop
Journal of Radiation Oncology*Biology*Physics 93, 1127–1135. https://doi.org/ modelling. Remote Sens. 12, 1024. https://doi.org/10.3390/rs12061024.
10.1016/j.ijrobp.2015.07.2286.
13