Reference paper 7
Reference paper 7
Article
Crop Yield Estimation Using Deep Learning Based on Climate
Big Data and Irrigation Scheduling
Khadijeh Alibabaei 1,2, * , Pedro D. Gaspar 1,2 and Tânia M. Lima 1,2
1 C-MAST Center for Mechanical and Aerospace Science and Technologies, University of Beira Interior,
6201-001 Covilhã, Portugal; [email protected] (P.D.G.); [email protected] (T.M.L.)
2 Deparment of Electromechanical Engineering, University of Beira Interior, Rua Marquês d’Ávila e Bolama,
6201-001 Covilhã, Portugal
* Correspondence: [email protected]
Abstract: Deep learning has already been successfully used in the development of decision support
systems in various domains. Therefore, there is an incentive to apply it in other important domains
such as agriculture. Fertilizers, electricity, chemicals, human labor, and water are the components
of total energy consumption in agriculture. Yield estimates are critical for food security, crop
management, irrigation scheduling, and estimating labor requirements for harvesting and storage.
Therefore, estimating product yield can reduce energy consumption. Two deep learning models,
Long Short-Term Memory and Gated Recurrent Units, have been developed for the analysis of
time-series data such as agricultural datasets. In this paper, the capabilities of these models and their
extensions, called Bidirectional Long Short-Term Memory and Bidirectional Gated Recurrent Units,
to predict end-of-season yields are investigated. The models use historical data, including climate
data, irrigation scheduling, and soil water content, to estimate end-of-season yield. The application
of this technique was tested for tomato and potato yields at a site in Portugal. The Bidirectional
Citation: Alibabaei, K.; Gaspar, P.D.; Long Short-Term memory outperformed the Gated Recurrent Units network, the Long Short-Term
Lima, T.M. Crop Yield Estimation Memory, and the Bidirectional Gated Recurrent Units network on the validation dataset. The model
Using Deep Learning Based on was able to capture the nonlinear relationship between irrigation amount, climate data, and soil
Climate Big Data and Irrigation water content and predict yield with an MSE of 0.017 to 0.039. The performance of the Bidirectional
Scheduling. Energies 2021, 14, 3004.
Long Short-Term Memory in the test was compared with the most commonly used deep learning
https://doi.org/10.3390/en14113004
method, the Convolutional Neural Network, and machine learning methods including a Multi-Layer
Perceptrons model and Random Forest Regression. The Bidirectional Long Short-Term Memory
Academic Editor: Amparo López
outperformed the other models with an R2 score between 0.97 and 0.99. The results show that
Jiménez
analyzing agricultural data with the Long Short-Term Memory model improves the performance of
Received: 20 April 2021 the model in terms of accuracy. The Convolutional Neural Network model achieved the second-best
Accepted: 20 May 2021 performance. Therefore, the deep learning model has a remarkable ability to predict the yield at
Published: 22 May 2021 the end of the season.
Publisher’s Note: MDPI stays neutral Keywords: agriculture; deep learning; LSTM; support decision-making algorithms; yield estimation;
with regard to jurisdictional claims in irrigation management
published maps and institutional affil-
iations.
1. Introduction
Agriculture is in a state of flux, and obstacles are emerging, such as climate change,
Copyright: © 2021 by the authors. environmental impacts, and lack of labor, resources, and land. Annual population growth
Licensee MDPI, Basel, Switzerland. and increasing demands on agricultural society to produce more from the same amount of
This article is an open access article agricultural land while protecting the environment are the significant challenges of this
distributed under the terms and century [1]. This scenario reinforces the constant need to seek alternatives in the face of
conditions of the Creative Commons
challenges to ensure higher productivity and better quality. Sustainable production of
Attribution (CC BY) license (https://
sufficient, safe, and high-quality agricultural products will be achievable if new technolo-
creativecommons.org/licenses/by/
gies and innovations are adopted. Smart farms rely on data and information generated
4.0/).
by agricultural technology, bringing the producer closer to digital technology [1]. This
includes the use of sensors and drones and the collection of accurate data such as weather
data, soil mapping, and others. Extracting knowledge from these data and creating de-
cision support systems is becoming increasingly important to optimize farms and add
value to meet the food needs of the population and ensure the sustainable use of natural
resources [1].
Deep learning (DL) is a subfield of machine learning. DL algorithms can be used
throughout the cultivation and harvesting cycle in agriculture and are receiving consid-
erable attention in developing such decision-making systems. The idea is to feed large
artificial neural networks with increasingly large amounts of data, extract features from
them automatically, and make decisions based on these data [2]. Deep here refers to
the number of hidden layers of the neural network. The performance of the model im-
proves as the network becomes deeper [2].
Crop yields and crop yield forecasts directly affect the annual national and inter-
national economy and play a major role in the food economy. Crop yields are highly
dependent on irrigation and climate data. More irrigation does not necessarily increase
yield [3], and therefore, optimization of irrigation and more efficient irrigation systems
are critical. Predicting yield based on different types of irrigation is one way to optimize
the process.
Machine learning algorithms are already used to estimate yield from images. Bar-
goti and Underwood [4] used Multi-Layer Perceptrons (MLP) and Convolutional Neural
Network (CNN) models to extract features from input images, and then two image pro-
cessing algorithms, Watershed Segmentation (WS) and Circular Hough Transform (CHT),
were used to detect and count individual fruits in these features. They added metadata
such as pixel position, row number, and sun azimuth to their algorithms and improved
the detection performance. The best performance was obtained for fruit detection by CNN
and WS with R2 = 0.826. Habaragamuwa et al. [5] developed a Region-based CNN
(R-CNN) model with AlexNet as the backbone and detected ripe and unripe strawber-
ries in greenhouse images. The model achieved an average precision of 82.61%. Kang
and Chen [6] implemented a clustering CNN model (C-RCNN) and a deep learning model,
LedNet, to detect apples on trees. The C-RCNN module was used to generate a label
for the training dataset, and LedNet was trained to detect apples on trees. A lightweight
network (LW-Net), ResNet110, ResNet50, and Darknet-53 were used as the backbone.
LedNet with the ResNet110 backbone, with 86% accuracy, and LedNet with LW-Net, with
weight size and computation time of 7.4 M and 28 ms, respectively, outperformed the other
models in terms of detection performance and computational efficiency. Koirala et al. [7]
developed a DL model, named Mango-YOLO, based on YOLO-v3 and YOLO-v2 (tiny)
for counting mangoes on trees. Mango-YOLO achieved the best performance in terms
of memory consumption, speed, and accuracy compared to the Faster R-CNN, Single
Shot multi-box Detector (SSD), and You Only Look Once (YOLO). Liang et al. [8] applied
the SSD network to detect mango and almond on tree fruits. The SSD model with the data
augmentation techniques and the smaller standard box was more accurate than the original
SSD network in detecting mango on trees. Stein et al. [9] developed an FR-RCNN using
VGG16 as a backbone for fruit detection and localization in a mango orchard. They used
three datasets for training. The first contained the image of apple trees from one side of
the trees, the second contained the image from both sides of the trees, and the third con-
tained images from multiple views of the trees. Training the model with images from two
and multiple views showed excellent performance (R2 ≥ 0.90). Tian et al. [10] developed
YOLO-V3 with DenseNet as the backbone to detect apples on trees. They used two datasets
for training. The first contained images of apples at one growth stage, and the second
contained images taken at different growth stages. Their results showed that the F1 score
of the model trained with the first dataset was higher than that of the model trained with
the second dataset. Apolo-Apolo et al. [11] used a Faster R-CNN model and a Long
Short-Term Memory (LSTM) model to estimate fruit number and fruit size. An average
Energies 2021, 14, 3004 3 of 21
standard error (SE) of 6.59% between visual fruit count and fruit detection by the model
was determined. An LSTM model was trained for per-tree yield estimation and total yield
estimation. Actual and estimated yields per tree were compared, yielding an approximate
error of SE = 4.53% and a standard deviation of SD = 0.97 kg. Maimaitijiang et al. [12]
used Partial Least Squares Regression (PLSR), Random Forest Regression (RFR), Support
Vector Regression (SVR), DNN (DNN-F1) based on input-level feature fusion, and DNN
(DNN-F2) based on mid-level feature fusion to estimate soybean yeast. The results showed
that multimodal data fusion improved the accuracy of yield prediction. DNN-F2 achieved
the highest accuracy with an R2 score of 0.720 and a relative root mean square error (RMSE)
of 15.9%. Yang et al. [13] proposed a CNN architecture for predicting rice grain yield from
low-altitude remote sensing images at the maturity stage. The proposed model consisted
of two separate branches for processing RGB and multispectral images. In a large rice-
growing region of Southern China, a 160-hectare area with over 800 cultivation units was
selected to investigate the ability of the model to estimate rice grain yield. The network
was trained with different datasets and compared with the traditional vegetation index-
based method. The results showed that the CNNs trained with RGB and multispectral
datasets performed much better than the VI-based regression model in estimating rice
grain yield at the maturity stage. Chen et al. [14] proposed a faster Region-based Con-
volutional Neural Network (R-CNN) for detecting and counting the number of flowers,
mature strawberries, and immature strawberries. The model achieved a mean average
accuracy of 0.83 for all detected objects at 2 m height and 0.72 for all detected objects at
3 m height. Zhou et al. [15] implemented an SSD model with two lightweight backbones,
MobileNetV2 and InceptionV3, to develop an Android app called KiwiDetector to detect
kiwis in the field. The results showed that MobileNetV2, quantized MobileNetV2, Incep-
tionV3, and quantized InceptionV3 achieved true detection rates of 90.8%, 89.7%, 87.6%,
and 72.8%, respectively.
The disadvantages of estimating the yield from images are:
• Pictures of the entire field must be collected each year to identify the crop in the pic-
tures and then estimate the yield.
• To train the model, a large number of labeled images is needed, which is very time-
consuming.
• Illumination variance, foliage cover, overlapping fruits, shaded fruits, and scale
variations affect the images [16].
Ma et al. [17] used climate, remote sensing data, and rice information to estimate rice
yield. A Stacked Sparse Auto-Encoder (SSAE) was trained and achieved a percent mean
square error of 33.09 kg (10a)−1 . Han et al. [18] implicated machine learning methods
including Support Vector Machine (SVM), Gaussian Process Regression (GPR), Neural
Network (NN), K-Nearest Neighbor Regression, Decision Tree (DT), and Random For-
est (RF) to integrate climate data, remote sensing data, and soil data to predict winter
wheat yield based on the Google Earth Engine platform (GEE). SVM, RF, and GPR with
an R2 > 0.75 were the three best yield prediction methods, among others. They also found
that different agricultural zones and temporal training settings affected the prediction
accuracy. Kim et al. [19] developed an optimized deep neural network for crop yield pre-
diction using optimized input variables from satellite products and meteorological datasets.
The input data were extracted from satellite-based vegetation indices and meteorological
and hydrological data, and a matchup database was created on the Cropland Data Layer
(CDL), a high-resolution map for classifying plant types. Using the optimized input dataset,
they implemented six major machine learning models, including multivariate adaptive
regression splines (MARS), SVM, RF, highly randomized trees (ERT), ANN, and DNN.
The DNN model outperformed the other models in predicting corn and soybean yields,
with a mean absolute error of 21–33% and 17–22%, respectively. Abbas et al. [20] used four
machine learning algorithms, namely Linear Regression (LR), Elastic Net (EN), K-Nearest
Neighbor (k-NN), and Support Vector Regression (SVR), to predict tuber yield of potato
(Solanum tuberosum) from soil and plant trait data acquired by proximal sensing. Four
Energies 2021, 14, 3004 4 of 21
datasets were used to train the models. The SVR models outperformed all other models
in each dataset with RMSE ranging from 4.62 to 6.60 t/ha. The performance of k-NN
remained poor in three out of four datasets.
In these papers, however, the amount of irrigation was not considered as an in-
put to the model. Yield is highly dependent on the amount of irrigation, and a change
in the amount of water can make a big difference in the yield. Considering irrigation
scheduling as an input to the model can help to create an intelligent system that selects
the best irrigation schedule to save water consumption without affecting production. To
optimize irrigation based on productivity, the irrigation amount must be considered as
an input to the model.
In this work, Recurrent Neural Networks (RNN), including the LSTM model and Gated
Recurrent Units (GRU) model and their extensions, Bidirectional LSTM (BLSTM) and Bidi-
rectional GRU (BGRU), were implemented to estimate tomato yield based on climate
data, irrigation amount, and water content in the soil profile. Agricultural datasets are
time-series, and agricultural forecasting relies heavily on historical datasets. The advantage
of RNN is its ability to process time-series data and make decisions for the future based
on historical data. The proposed models predict the yield at the end of the season given
historical data from the field such as temperature, wind speed, solar radiation, ETo, the wa-
ter content in the soil profile, and irrigation scheduling during a season (the codes are
available at the following links: https://github.com/falibabaei/tomato-yieldestimation/
blob/main/main (accessed date 22 May 2021), https://github.com/falibabaei/potato-
yield-estimation/tree/main (accessed date 22 May 2021)). The performance of the model
was evaluated using the mean square error and R2 score. In addition, the performance of
these models was compared with a CNN, an MLP model, and a Random Forest Regression
(RF). The advantages of the yield estimation model are:
• Using RNN models to extract features from past observations in the field and predict
yield at the end of the season.
• Using climatic data collected in the field as input to the model, which is easier than
using collected images from the field.
• Irrigation amount was used as input in the model, and it is shown that the model can
capture the relationship between irrigation amount and yield at the end of the season.
• It is shown that the model can be used as part of an end-to-end irrigation decision-
making system. This system can be trained to decide when and how much water to
irrigate and maximize net return without wasting water.
and localization and achieved excellent results. However, CNN models make predictions
based on current input data and do not use past observations to make future decisions.
Unlike CNN, information in recurrent neural networks goes through a loop that
allows the network to remember the previous outputs [24]. It enables the analysis of
sequences and time series. RNN is commonly used for natural language processing
and other sequences. A recurrent network can be thought of as multiple copies of the same
network, each passing information to the next (see Figure 1).
where ht−1 is the recurrent output from the previous step, xt is the input at the current
time, and Wx , Wh , bt are the weights and bias of the network to be trained when training.
The problem with RNN is that if the sequential input is large, the gradient of the loss
function can end at zero, effectively preventing the weights of the model from being
updated [24].
f t = σ (W f x x t + W f h h t − 1 + b f ) , (2)
it = σ (Wix xt + Wih ht−1 + bi ), (3)
zt = tanh(Wzx xt + Wzh ht−1 + bz ), (4)
c t = f t ∗ c t −1 + i t ∗ z t . (5)
ot = σ (Wox xt + Woh ht−1 + bo ) (6)
First, a vector is created by applying the tanh function to the cell. Then, the information
is regularized using the sigmoid function, which filters the values to be stored based
Energies 2021, 14, 3004 6 of 21
on the inputs ht−1 and xt . The vector values and the regulated values are multiplied by
Equation (7) to be sent as output and input to the next cell.
ht = ot ∗ tanh(ct ). (7)
GRUs discard the cell state and use the hidden state to transmit information. This ar-
chitecture contains only two gates: the update gate zt and the reset gate rt . Like LSTM gates,
GRU gates are trained to selectively filter out all irrelevant information while preserving
the useful information and can be calibrated using Equations (8)–(11):
Figure 2. LSTM and GRU structures. Left side shows LSTM cell, right side shows GRU cell.
The functions tanh and σ add nonlinearity to the network. These functions allow
the model to capture the nonlinear relationships between the inputs and outputs of
the model. At the beginning of training, the weights and biases in Equations (2)–(4),
(6) and (8)–(10) are set randomly. During training, the model tries to set the weights and bi-
ases in such a way that the loss function is minimized. Therefore, the algorithm of an RNN
model is an optimization problem.
The LSTM and GRU models have similar units, but they also differ. For example,
in the LSTM unit, the amount of memory content seen or used by other units in the network
is controlled by the output gate. In contrast, the GRU releases all of its content without
any control [27]. From these similarities and differences alone, it is difficult to conclude
which model performs better on one problem than another. In this paper, both models
were implemented to see which model performs better on the yield estimation problem.
Figure 4. Left side shows the normal neural network and the right side shows the network with dropout.
Dropout Size
Dropout Size
0 0.1 0.2 0.3 0.4
Tomato 0.00021 0.00019 0.00033 0.00050 0.00068
Potato 0.00140 0.00109 0.00101 0.00092 0.00185
Another solution is early stopping, which consists of splitting the dataset into three
sets, one for training, one for validation, and one test set [30]. In this method, the validation
Energies 2021, 14, 3004 8 of 21
loss is constantly evaluated in each episode, and if the validation loss does not improve
for a certain number of episodes, the training is stopped. This technique does not allow
the network to be very specific about the training set.
In this work, dropout and early stopping were used to prevent overfitting.
The big data included minimum, maximum, and average temperature; minimum,
maximum, and average relative humidity; average solar radiation; minimum and average
wind speed; and precipitation. They were recorded every 15 mins and converted to
the daily variable in the data preprocessing section. Table 3 shows the details of these
variables. Figure 6 shows the daily variables from 2010 to 2019 (10 years of data).
Table 3. Cont.
35
25 40
30
20 35
25 30
15
20 25
Tmax
TAvg
Tmin
10 15 20
5 10 15
10
0 5
5
5 0
0
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
Time step Time step Time step
100
90 90
80
80 80
60 70 70
60
HRmax
HRmin
HRAvg
60
40 50
50
40
20 40
30
30
0 20
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
Time step Time step Time step
350 30
100
300 25
80
250 20
60 200
15
WSAvg
SRAvg
Prec
40 150
10
100
20 5
50
0 0 0
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
Time step Time step Time step
year, including no irrigation and a fixed irrigation depth of 5, 10, 15, 20, 25, 30, 35, 40, 45,
50, 60 mm every four days or when the allowable depletion reached the thresholds of
90%, 80%, and 70%, respectively. Other parameters were kept unchanged. Figure 7 shows
tomato yield under the fixed irrigation depth of 20 mm and water content throughout
the soil profile from 2010 to 2018. As can be seen in the figure, the yield varies under
different climatic conditions.
4 240
2016 2016
WcTot
2017 220 2017
3 2018 2018
200
2
180
1
160
0 140
0 20 40 60 80 100 0 20 40 60 80 100
Time step Time step
Figure 7. Evolution of tomato yield and total water content profile under fixed irrigation depth of
20 mm. Left side shows tomato yield and right side shows WCTot.
Evapotranspiration is the amount of water that evaporates from the Earth, and soil
water content is the volume of water per unit volume of soil. Evapotranspiration (ETo)
and water content in the total soil profile (WCTot) were also simulated during the simula-
tion and used as inputs to the models. AguaCrop estimated ETo from meteorological data
using the FAO Penman–Monteith Equation (12) [32].
( es − e a )
∆( Rn − G ) + ρ a c p rs
ETo = rs (12)
∆ + γ (1 + ra )
where Rn is the net radiation, G is the soil heat flux, (es − ea ) represents the vapor pressure
deficit of the air, ρ is the mean air density at constant pressure, c p is the specific heat of
the air, ∆ represents the slope of the saturation vapor pressure temperature relationship,
γ is the psychrometric constant, and rs and r a are the (bulk) surface and aerodynamic
resistances. Figure 8 shows the ETo from 2010 to 2019.
The advantage of using deep learning models is that they do not require manual
adjustments once the model is trained and can be used automatically. These models can be
used to create an end-to-end decision support system for irrigation scheduling.
14
12
10
8
ETo
6
4
2
0
0 500 1000 1500 2000 2500 3000 3500
Time step
Figure 8. ETo calculated by Aquacrop.
Energies 2021, 14, 3004 11 of 21
xold − xmin
xnew = (13)
xmax − xmin
where xmin and xmax are the minimum and maximum of each variable in the dataset.
also performed very poorly and were excluded from the experimental results. In the end,
the network architectures were tested with 64, 128, 256, and 512 nodes per layer.
The MSE was calculated for the validation dataset with all combinations of the number
of layers and the number of nodes. The results for the different crops are shown in Table 4.
As can be seen in Table 4, the BLSTM model improves the validation loss, but the GRU
model has worse performance compared to the LSTM model. Therefore, removing the cell
state from the LSTM reduces the performance of the model in the yield prediction problem.
Table 4. Validation MSE for different numbers of LSTM layers and LSTM nodes per layer.
Finally, a two-layer BLSTM with 128 nodes and a two-layer BLSTM with 512 nodes
were selected to predict potato and tomato yield, respectively. Tables 5 and 6 show the ar-
chitecture of the models selected for crop yield prediction. Adding an additional dense
layer after the BLSTM layers improved the results. The tanh function was used as an ac-
tivation function after each BLSTM layer to capture the nonlinear relationship between
input and output [24].
The batch size is also chosen as a power of two due to hardware reasons. The number
of samples in the training datasets is less than 77, so the batch size is selected from
{16, 32, 64}. For simplicity, the learning rate and decay time were chosen as negative
powers of ten. With a learning rate and decay of 10−5 , the model trains very slowly,
and even after 500 epochs, the validation loss is very high, and the learning rate and decay
of 10−2 causes fluctuations in training. Therefore, the learning rate and decay are kept
below 10−6 and above 10−2 . Table 7 shows the loss on the validation set for different crops
when the hyperparameters are chosen differently. As can be seen, the same model with
Energies 2021, 14, 3004 14 of 21
different hyperparameters achieves different results. The models with a learning rate of
10−4 and 10−3 (potato and tomato, respectively), batch size of 64, and decay of 10−5 had
the best performance.
Dropout size is the percentage of nodes that randomly drop out during training.
The dropout size of 0.4 did not improve the validation loss, so it was chosen to be less than
or equal to 0.4 and from the set {0.1, 0.2, 0.3, 0.4}. In LSTM models, dropout can be added
to the input layer, outputs, and recurrent outputs [29]. Adding dropout on the recurrent
outputs with dropout size 0.1 for the potato yield estimation model and on the outputs of
the layers with dropout size 0.3 for the tomato yield estimation model improved the vali-
dation loss and was therefore selected for each model (see Table 1).
Batch Size
Batch Size 16 32 64
Tomato 0.00012 0.00020 0.00008
Potato 0.00164 0.00393 0.00092
Learning Rate
Learning Rate 10−3 10−4 10−5
Tomato 0.00005 0.00008 0.00571
Potato 0.00193 0.00092 0.02338
Decay
Decay 10−3 10−4 10−5
Tomato 0.00006 0.00007 0.00005
Potato 0.00140 0.00107 0.00092
Each model was trained for a maximum of 500 epochs, and each epoch lasted two
seconds. As mentioned earlier, early stopping was used to prevent overfitting. In this
method, training is stopped when the validation loss does not improve for a certain
number of epochs. Patience is a hyperparameter in the early stopping method that controls
the number of epochs in which the validation loss no longer improves [30]. The exact
amount of patience varies by model and problem. Examining plots of model performance
measures can be used to determine patience. In this work, by examining the plot, patience
was determined to be 30 and 50 for the tomato and potato models, respectively. As shown
in Figure 9, training of the tomato and potato yield prediction models was completed after
360 and 250 epochs, respectively. The tomato yield prediction model was trained with
more samples in the training set due to larger availability of experimental data, which
may result in the training of this model being more stable than that of the potato yield
prediction model.
Energies 2021, 14, 3004 15 of 21
Loss
Loss
0.3
2
0.2
1 0.1
0 0.0
0 50 100 150 200 250 300 350 0 50 100 150 200 250
Epoch Epoch
Figure 9. Model loss during training. The left side shows the loss of tomato yield prediction model
and the right side shows potato yield prediction model.
Since the test dataset was randomly selected, there were different seasons with differ-
ent amounts of irrigation in each test dataset. Table 8 shows the performance of the models
on the test dataset, and Figure 10 shows the actual value of yield compared to the values
predicted by the models. The model predicting tomato yield with an MSE of 0.017 per-
formed better on the test dataset than the model predicting potato yield with an MSE of
0.039. This result could be due to the fact that the standard deviation of tomato yield is
smaller than the standard deviation of potato yield (see Table 3) and also, as mentioned
earlier, the model used to estimate tomato yield was trained with a larger training set.
As shown in Figure 10, the tomato test dataset included the 2010 season under four
different irrigation levels. The irrigation amounts were 0, 10, 20, and 60, and the model
was able to achieve an MSE of 0.02 in this season. The same result was true for the potato
crop estimation model, and the model achieved an MSE of 0.09 to predict the 2012 crop
under four irrigation amounts. These results show that the model not only captures
the relationship between climate data and yield but also can accurately predict yield under
different irrigation amounts in a season.
8 12 2011
2012
t 11 2013
7 2010 2014
2011 2016
Predictions
Predictions
6 2012 10 2017
2013 2018
2014 9 2019
5 2015
2016 8
4 2017
2018
2019 7
3
4 5 6 7 8 7 8 9 10 11 12
True Values True Values
The ability to hold water depends on the soil type. As mentioned in the Data Collection
section, the soil type of Fadagosa is sandy loam, and these models are designed for a sandy
loam soil type. Therefore, these models work well on this soil type. Moreover, the models
were trained with simulated data. However, the combination of simulated data and real
data may give better results.
The performance of BLSTM models was compared with CNN, MLP, and the traditional
machine learning algorithm Random Forest (RF).
MLP is a computational model inspired by the human nervous system. It is able to
detect patterns in a mass of data in order to categorize it or regress a value. An MLP model
consists of an input layer, a stack of fully connected layers (hidden layers), and an output
layer. The fully connected layers connect every neuron in one layer to every neuron in the next
layer. Mathematically, these layers are a linear function where each layer takes the output
of the previous layer and adds weights and biases to it. To add nonlinearity to the model,
Energies 2021, 14, 3004 16 of 21
the activation function (e.g., ReLU, tanh) is used after each fully connected layer [42]. Again,
to avoid overfitting, the number of nodes in each layer of the MLP model was kept below
513 nodes, and the model with 512 neurons achieved the best performance in each dataset.
An MLP with 512 neurons and a different number of layers was implemented. The performance
improvement of the MLP model with five layers stopped on both datasets, so the number of
layers was kept below 5. Table 9 shows the performance of these models. The models with
three layers and four layers achieved the best performance in predicting tomato and potato
yields, with R2 scores of 0.89 and 0.71, respectively.
CNN is a deep learning algorithm. A CNN model consists of a stack of convolutional
layers, nonlinear activation functions, pooling layers (e.g., maximum pooling, average
pooling), Flatten layers, and fully connected layers [24]. Similar to the fully connected
layers, a convolutional layer is a linear function, and it contains a set of kernels. A kernel
is a matrix used for a matrix multiplication operation. This operation is applied multiple
times in different input regions, and extracts feature maps from the input. Pooling is
a reduction process. It is a simple process to reduce the dimension of feature maps
and hence the number of parameters trained by the network. A Flatten layer is usually
used in splitting convolutional layers and the fully connected layers. It basically performs
a transformation in the output of the convolutional layer and changes its format to an array.
The fully connected layer takes the feature maps from the Flatten layer and applies weights
and biases to predict the correct label or regress a value [24]. The number of kernels
in each convolutional layer and the number of convolutional layers was chosen manually.
For the same hardware reason, the number of kernels in each convolutional layer is kept
as a power of two. The models with a number of four convolutional layers or more than
512 kernels in each layer start to overfit. Therefore, the number of kernels and layers is kept
below 513 and four, respectively. All combinations of the number of layers and kernels were
implemented. The CNN model with 512 kernels and a number of layers of 2 and 3 achieved
better validation loss. The batch size, learning rate, and decay time from the BLSTM model
were used for the CNN models. Padding was used in each convolutional layer to ensure
that the output had the same shape as the input. The kernel size is usually chosen as an odd
number. The model with a kernel size greater than 11 starts with overfitting, and below
three, the model performance was poor, so the kernel size was chosen from {5, 7, 11}.
Table 9 shows the architecture of the CNN model.
Table 8. Performance of the models on the test set. L shows the number of layers, and K shows
the kernel size.
The CNN with two layers and three layers and a kernel size of 5 and 11 (tomato
and potato, respectively), with an R2 score of 0.96 and 0.933, performed better in predicting
yield than models with other combinations of the number of layers and kernel sizes.
RF is one of the most powerful machine learning methods. The Random Forest
consists of several Decision Trees. Each individual tree is a very simple model that has
branches, nodes where a condition is verified, and if it is satisfied, the flow goes through
one branch, otherwise through the other, always to the next node until the tree is finished.
As Table 8 shows, the RF model outperformed the MLP model in predicting yield, with
an R2 score of 0.87 to 0.90.
Computation time in the training process for each epoch lasted two, one, and less
than one second for BLSTM, CNN, and MLP, respectively. Although the computation time
in training BLSTM was higher than the other models, BLSTM achieved the best accuracy
in the test dataset.
As mentioned earlier, one of the applications of this model is to create a decision-
making system that decides when and how much to irrigate to avoid wasting water
without affecting productivity. The yield prediction model is used to calculate the net yield
at the end of the season. The net return in agriculture is calculated using Equation (16).
R = Y ∗ Py − W ∗ Pw (16)
where Y is the yield at the end of the season, Py is the price of the yield, W is the total
amount of water used for irrigation, and Pw is the price of the water.
To show an application of the model, the net return for tomato yield in 2018 and 2019
was calculated under random irrigation every five days. An RF model was implied to
predict WcTot after each irrigation. The model receives five days of climate data and irriga-
tion rate and predicts WcTot for the next day. The RF model predicts WcTot with an R2
score of 0.80. Algorithm 1 was used to calculate the net return at the end of the season
under random irrigation. To calculate the net return, the cost of irrigation per 1 ha-mm/ha
was assumed to be nearly 0.5 USD [43], and tomato prices were assumed to be nearly
728.20 USD/ton (www.tridge.com (accessed date 22 May 2021)).
Energies 2021, 14, 3004 18 of 21
Algorithm 1: Algorithm
Env step(current_state, action):
season_state = List ();
next_WcTot = RF_WcTot.predict (current_state, action);
season_state.append (current_state, action);
if time_passed = end_o f _season then
done = True;
Y = BLSTM _yield.predict (season_state);
calculate reward from Equation (16);
else
done = False;
reward = 0;
time_passed = time_passed+1;
return next_WcTot, reward, done
steps = 5;
n_steps = 2;
env = Env ();
for i = 1 to n_steps do
state = state [0];
action = 0;
done = False;
while done = False do
for k = steps to steps + 5 do
if k = steps then
action = random_action (0, 60);
else
action = 0 ;
new_WcTot, reward, done = env.step (states, action, k);
state = new_WcTot + climate_data
Tables 10 and 11 show the net returns and the parameters used in Algorithm 1.
Param Explanation
action volume of water
Done a Boolean value represent whether a season is complete
n_steps number of season
steps time step between irrigation
state climate big data and WcTot
new_WcTot WcTot after irrigation
time_passed time elapsed after the start of the season
state [0] the data from the first 5 days of the season
In this example, the irrigation amount was randomly selected, but in future work, a re-
inforcement learning agent is trained to select the best irrigation amount, and the random
Energies 2021, 14, 3004 19 of 21
action (water amount) is replaced by the model. In Deep Reinforcement Learning algo-
rithms, there is an environment that interacts with an agent. During the training, the agent
chooses an action based on the current state of the environment, and the environment
returns the reward and the next state to the agent. The agent tries to choose the action that
maximizes the reward [44]. In the agricultural domain, the state of the environment can
be defined as the climate data and the water content of the soil; the action is the amount
of irrigation, and the reward is the net return. The function Env() in Algorithm 1 is used
as the environment. An agent can be trained to select the amount of irrigation based
on the condition of the field. Therefore, the yield estimation model can be used as part of
the environment of Deep Reinforcement Learning to calculate the reward of the agent. This
system can be trained end-to-end.
4. Conclusions
Irrigation and storage are closely related to the use of energy. Efficient irrigation
minimizes unnecessary water use, which contributes to energy conservation. Yield esti-
mation models help to reduce energy consumption, increase productivity, and estimate
labor requirements for harvesting and storage requirements. RNN models offer several
advantages over other deep learning models and traditional machine learning approaches.
The most important aspect is their ability to process time-series data such as agricultural
datasets. In this work, the ability of RNN models to predict tomato and potato yields based
on climate data and irrigation amount was investigated. The LSTM, GRU, and their exten-
sion BLSTM and LSTM models were trained on sandy loam soil for crop yield prediction.
The results show that the use of BLSTM models outperformed the simple LSTM, GRU,
and BGRU models on the validation set. In addition, the LSTM model performed better
than the GRU model in the validation set. Therefore, removing the cell state from the LSTM
nodes could be problematic in our context.
The BLSTMs achieved an R2 score of 0.97 to 0.99 on the test set. The results show
that BLSTM models can automatically extract features from raw agricultural data, capture
the relationship between climate data and irrigation amount, and convert it into a valuable
model for predicting future field yields. One drawback of these models is that a sufficient
amount of clean data is needed to train the model. With more data, the model can make better
predictions. In this work, the simulated yield dataset was used to overcome this disadvantage.
The performance of the BLSTM was compared with the CNN model, MLP, and RF,
and it was found that the BLSTM outperformed the MLP networks and CNN and RF
in yield prediction. The CNN model achieved the second-best performance and MLP
the worst performance. The results show that past observations of a season are important
in yield prediction. The BLSTM model can capture the relationship between past observa-
tions and the new observations and predict the yield more accurately. One disadvantage of
the BLSTM model was that the training time of the BLSTM model was higher than other
implemented models.
One of the applications of the yield prediction model is to develop an end-to-end
decision support system that automatically decides when and how much to irrigate. Deep
Reinforcement Learning models are used to build such a system. An agent can be trained
to select the amount of irrigation based on the condition of the field. To train such a model,
a reward function must be developed. In the agricultural domain, the net reward is used as
the reward for the agent. Since it is difficult and time-consuming to work with a simulation
system such as Aquacrop to train a deep learning model, the yield estimation model is
used to determine the net return of the agent at the end of each season, and the agent can
decide based on this reward. This system can help farmers to decide when and how much
to irrigate and reduces water consumption without affecting productivity.
References
1. Sundmaeker, H.; Verdouw, C.N.; Wolfert, J.; Freire, L.P. Internet of Food and Farm 2020. In Digitising the Industry; Vermesan, O.,
Friess, P., Eds.; River Publishers: Gistrup, Denmark, 2016; pp. 129–150.
2. Nguyen, G.; Dlugolinsky, S.; Bobak, M.; Tran, V.; Garcia, A.L.; Heredia, I.; Malik, P.; Hluchy, L. Machine Learning and Deep Learning
frameworks and libraries for large-scale data mining: A survey. Artif. Intell. Rev. 2019, 52, 77–124, [CrossRef]
3. Hayes, M.J.; Decker, W.L. Using NOAA AVHRR data to estimate maize production in the United States Corn Belt. Int. J.
Remote Sens. 1996, 17, 3189–3200. [CrossRef]
4. Bargoti, S.; Underwood, J.P. Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards. J. Field Robot. 2017,
34, 1039–1060. [CrossRef]
5. Habaragamuwa, H.; Ogawa, Y.; Suzuki, T.; Shiigi, T.; Ono, M.; Kondo, N. Detecting greenhouse strawberries (mature and imma-
ture), using deep convolutional neural network. Eng. Agric. Environ. Food 2018, 11, 127–138. [CrossRef]
6. Kang, H.; Chen, C. Fast implementation of real-time fruit detection in apple orchards using deep learning. Comput. Electron. Agric.
2020, 168, 105108. [CrossRef]
7. Koirala, A.; Walsh, K.B.; Wang, Z.; McCarthy, C. Deep learning for real-time fruit detection and orchard fruit load estimation:
Benchmarking of ‘MangoYOLO’. Precis. Agric. 2019, 20, 1107–1135. [CrossRef]
8. Liang, Q.; Zhu, W.; Long, J.; Wang, Y.; Sun, W.; Wu, W. A Real-Time Detection Framework for On-Tree Mango Based on SSD Net-
work. In Intelligent Robotics and Applications, Proceedings of the 11th International Conference, ICIRA 2018, Newcastle, NSW, Australia,
9–11 August 2018; Springer International Publishing: Cham, Switzerland, 2018; pp. 423–436.
9. Stein, M.; Bargoti, S.; Underwood, J. Image Based Mango Fruit Detection, Localisation and Yield Estimation Using Multiple View
Geometry. Sensors 2016, 16, 1915. [CrossRef]
10. Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using
the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [CrossRef]
11. Apolo-Apolo, O.; Martínez-Guanter, J.; Egea, G.; Raja, P.; Pérez-Ruiz, M. Deep learning techniques for estimation of the yield
and size of citrus fruits using a UAV. Eur. J. Agron. 2020, 115, 126030. [CrossRef]
12. Maimaitijiang, M.; Sagan, V.; Sidike, P.; Hartling, S.; Esposito, F.; Fritschi, F.B. Soybean yield prediction from UAV using
multimodal data fusion and deep learning. Remote Sens. Environ. 2020, 237, 111599. [CrossRef]
13. Yang, Q.; Shi, L.; Han, J.; Zha, Y.; Zhu, P. Deep convolutional neural networks for rice grain yield estimation at the ripening stage
using UAV-based remotely sensed images. Field Crop. Res. 2019, 235, 142–153. [CrossRef]
14. Chen, Y.; Lee, W.S.; Gan, H.; Peres, N.; Fraisse, C.; Zhang, Y.; He, Y. Strawberry Yield Prediction Based on a Deep Neural Network
Using High-Resolution Aerial Orthoimages. Remote Sens. 2019, 11, 1584. [CrossRef]
15. Zhou, Z.; Song, Z.; Fu, L.; Gao, F.; Li, R.; Cui, Y. Real-time kiwifruit detection in orchard using deep learning on AndroidTM
smartphones for yield estimation. Comput. Electron. Agric. 2020, 179, 105856. [CrossRef]
16. Rahnemoonfar, M.; Sheppard, C. Deep Count: Fruit Counting Based on Deep Simulated Learning. Sensors 2017, 17, 905.
[CrossRef] [PubMed]
17. Ma, J.W.; Nguyen, C.H.; Lee, K.; Heo, J. Regional-scale rice-yield estimation using stacked auto-encoder with climatic and MODIS
data: A case study of South Korea. Int. J. Remote Sens. 2019, 40, 51–71. [CrossRef]
18. Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data
and Machine Learning in China. Remote Sens. 2020, 12, 236. [CrossRef]
19. Kim, N.; Ha, K.J.; Park, N.W.; Cho, J.; Hong, S.; Lee, Y.W. A Comparison Between Major Artificial Intelligence Models for Crop
Yield Prediction: Case Study of the Midwestern United States, 2006–2015. ISPRS Int. J. Geo-Inf. 2019, 8, 240. [CrossRef]
20. Abbas, F.; Afzaal, H.; Farooque, A.A.; Tang, S. Crop Yield Prediction through Proximal Sensing and Machine Learning Algorithms.
Agronomy 2020, 10, 1046. [CrossRef]
21. Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [CrossRef]
22. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances
in Neural Information Processing Systems 25; Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q., Eds.; Curran Associates, Inc.:
Red Hook, NY, USA, 2012; pp. 1097–1105.
Energies 2021, 14, 3004 21 of 21
23. Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. OverFeat: Integrated Recognition, Localization and Detection
using Convolutional Networks. arXiv 2013, arXiv:1312.6229.
24. Patterson, J.; Gibson, A. Deep Learning: A Practitioner’s Approach; O’Reilly: Beijing, China, 2017.
25. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
26. Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations
using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods
in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; Association for Computational Linguistics: Doha,
Qatar, 2014; pp. 1724–1734.
27. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling.
In Proceedings of the NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada, 8–13 December 2014.
28. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [CrossRef]
29. Gal, Y.; Ghahramani, Z. A Theoretically Grounded Application of Dropout in Recurrent Neural Networks. In Proceedings of
the 30th International Conference on Neural Information Processing Systems, NIPS’16; Curran Associates Inc.: Red Hook, NY, USA,
2016; pp. 1027–1035.
30. Prechelt, L. Early Stopping—But When? In Neural Networks: Tricks of the Trade: Second Edition; Montavon, G., Orr, G.B., Müller, K.R.,
Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 53–67.
31. Raes, D.; Steduto, P.; Hsiao, T.C.; Fereres, E. AquaCrop—The FAO Crop Model to Simulate Yield Response to Water: II. Main
Algorithms and Software Description. Agron. J. 2009, 101, 438–447. [CrossRef]
32. Allen, R.G.; Pereira, L.S.; Raes, M.S.D. Crop eVapotranspiration—Guidelines for Computing Crop Water Requirements FAO Irrigation
and Drainage Paper 56; FAO—Food and Agriculture Organization of the United Nations: Rome, Italy, 1998.
33. Montgomery, D.C.; Jennings, C.L.; Kulahci, M. Introduction to Time Series Analysis and Forecasting; Wiley Series in Probability
and Statistics; Wiley: Hoboken, NJ, USA, 2011.
34. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer Publishing
Company, Incorporated: Berlin/Heidelberg, Germany, 2014.
35. Willmott, C.J.; Ackleson, S.G.; Davis, R.E.; Feddema, J.J.; Klink, K.M.; Legates, D.R.; O’Donnell, J.; Rowe, C.M. Statistics for
the evaluation and comparison of models. J. Geophys. Res. Ocean. 1985, 90, 8995–9005. [CrossRef]
36. Zhang, J.; Zhu, Y.; Zhang, X.; Ye, M.; Yang, J. Developing a Long Short-Term Memory (LSTM) based model for predicting water
table depth in agricultural areas. J. Hydrol. 2018, 561, 918–929. [CrossRef]
37. Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow:
Large-Scale Machine Learning on Heterogeneous Systems, 2015. Software. Available online: tensorflow.org (accessed on 3 March
2020).
38. Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 3 March 2020).
39. Dreyfus, S. The numerical solution of variational problems. J. Math. Anal. Appl. 1962, 5, 30–45. [CrossRef]
40. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
41. Vanhoucke, V.; Senior, A.; Mao, M.Z. Improving the speed of neural networks on CPUs. In Proceedings of the Deep Learning
and Unsupervised Feature Learning Workshop, NIPS 2011, Granada, Spain, 12–17 December 2011.
42. Murtagh, F. Multilayer perceptrons for classification and regression. Neurocomputing 1991, 2, 183–197. [CrossRef]
43. Rodrigues, L.C. Water Resources Fee in Portugali, 2016. Led by the Institute for European Environmental Policy. Available
online: https://ieep.eu/ (accessed on 3 March 2021).
44. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018.