Frwa 04 927113
Frwa 04 927113
REVIEWED BY
a changing climate
Shervan Gharari,
University of Saskatchewan, Canada
James Gilbert, Elena Leonarduzzi1*, Hoang Tran2,3 , Vineet Bansal4 ,
University of California, Santa Cruz,
United States Robert B. Hull5 , Luis De la Fuente5 , Lindsay A. Bearup6 ,
*CORRESPONDENCE Peter Melchior4,7 , Laura E. Condon5 and Reed M. Maxwell1,2,8
Elena Leonarduzzi
1
[email protected] High Meadows Environmental Institute, Princeton University, Princeton, NJ, United States,
2
Department of Civil and Environmental Engineering, Princeton University, Princeton, NJ,
SPECIALTY SECTION United States, 3 Pacific Northwest National Laboratory, Atmospheric Sciences & Global Change
This article was submitted to Division, Richland, WA, United States, 4 Center for Statistics and Machine Learning, Princeton
Water and Climate, University, Princeton, NJ, United States, 5 Hydrology and Atmospheric Sciences, University of
a section of the journal Arizona, Tucson, AZ, United States, 6 Bureau of Reclamation Denver, Denver, CO, United States,
Frontiers in Water 7
Department of Astrophysical Sciences, Princeton University, Princeton, NJ, United States,
8
RECEIVED 23 April 2022 Integrated GroundWater Modeling Center, Princeton University, Princeton, NJ, United States
ACCEPTED 15 September 2022
PUBLISHED 11 October 2022
CITATION The water content in the soil regulates exchanges between soil and
Leonarduzzi E, Tran H, Bansal V, atmosphere, impacts plant livelihood, and determines the antecedent
Hull RB, De la Fuente L, Bearup LA,
Melchior P, Condon LE and condition for several natural hazards. Accurate soil moisture estimates are
Maxwell RM (2022) Training machine key to applications such as natural hazard prediction, agriculture, and water
learning with physics-based
management. We explore how to best predict soil moisture at a high resolution
simulations to predict 2D soil moisture
fields in a changing climate. in the context of a changing climate. Physics-based hydrological models
Front. Water 4:927113. are promising as they provide distributed soil moisture estimates and allow
doi: 10.3389/frwa.2022.927113
prediction outside the range of prior observations. This is particularly important
COPYRIGHT
considering that the climate is changing, and the available historical records
© 2022 Leonarduzzi, Tran, Bansal,
Hull, De la Fuente, Bearup, Melchior, are often too short to capture extreme events. Unfortunately, these models
Condon and Maxwell. This is an are extremely computationally expensive, which makes their use challenging,
open-access article distributed under
the terms of the Creative Commons
especially when dealing with strong uncertainties. These characteristics make
Attribution License (CC BY). The use, them complementary to machine learning approaches, which rely on training
distribution or reproduction in other data quality/quantity but are typically computationally efficient. We first
forums is permitted, provided the
original author(s) and the copyright demonstrate the ability of Convolutional Neural Networks (CNNs) to reproduce
owner(s) are credited and that the soil moisture fields simulated by the hydrological model ParFlow-CLM. Then,
original publication in this journal is
cited, in accordance with accepted
we show how these two approaches can be successfully combined to predict
academic practice. No use, distribution future droughts not seen in the historical timeseries. We do this by generating
or reproduction is permitted which additional ParFlow-CLM simulations with altered forcing mimicking future
does not comply with these terms.
drought scenarios. Comparing the performance of CNN models trained on
historical forcing and CNN models trained also on simulations with altered
forcing reveals the potential of combining these two approaches. The CNN
can not only reproduce the moisture response to a given forcing but also learn
and predict the impact of altered forcing. Given the uncertainties in projected
climate change, we can create a limited number of representative ParFlow-
CLM simulations (ca. 25 min/water year on 9 CPUs for our case study), train
our CNNs, and use them to efficiently (seconds/water-year on 1 CPU) predict
additional water years/scenarios and improve our understanding of future
drought potential. This framework allows users to explore scenarios beyond
past observation and tailor the training data to their application of interest (e.g.,
wet conditions for flooding, dry conditions for drought, etc…). With the trained
ML model they can rely on high resolution soil moisture estimates and explore
the impact of uncertainties.
KEYWORDS
1. Introduction are set-up and trained, which makes them very attractive
to solve computationally expensive hydrological problems.
Soil moisture, defined as the water content in the Nevertheless, their performances depend heavily on the quality
unsaturated soil top layer, is an essential dynamic hydrological of the data used to train them. Historically hydrological ML
property (e.g., Ochsner et al., 2013). Being at the interface models have been trained on point observations (refer to
between land and atmosphere, it plays an important role in overview in Lange and Sippel, 2020; Shen et al., 2021). The
the water and energy balance processes (e.g., Pauwels et al., most widespread application of machine learning in hydrology
2001). It impacts the energy partitioning between latent and is for the prediction of streamflow with Long-Short Term
sensible heat at the surface (e.g., Seneviratne et al., 2010) but memory (LSTM) models (e.g., Kratzert et al., 2019; Chen
also affects the generation of surface runoff (e.g., Merz and et al., 2020). However, few recent studies have shown the
Plate, 1997) and several biogeochemical cycles (e.g., Seneviratne potential of machine learning approaches also in the prediction
et al., 2010). Knowledge of the soil moisture state is essential of distributed hydrological variables (ElSaadani et al., 2021;
for diverse applications which require high-resolution estimates Maxwell et al., 2021; Tran et al., 2021), and advanced techniques
over large areas, in the field of natural hazards, but also for have also been explored in the field of the weather forecast (e.g.,
management decisions (e.g., Dobriyal et al., 2012). For instance, Weyn et al., 2019) and in particular precipitation nowcasting
soil moisture estimates are frequently used for agricultural (e.g., Shi et al., 2008; Chen et al., 2020; Su et al., 2020).
drought monitoring (e.g., Narasimhan and Srinivasan, 2005; Here, we take advantage of the complementary nature of
Bolten et al., 2010; Martínez-Fernández et al., 2016) or for the physics-based models, which are informative and suited for
prediction of natural hazards such as floods (e.g., Norbiato et al., experimenting beyond past observations but computationally
2008; Massari et al., 2014) and landslides (e.g., Bogaard and expensive, and ML models, which are very efficient but
Greco, 2018; Mirus et al., 2018; Leonarduzzi et al., 2021). dependent on training data quality and quantity. We use a
Estimates of soil moisture retrieved from in-situ physics-based hydrological model to run simulations using both
measurements or remote-sensing (e.g., Sharma et al., 2018) historical forcing and forcing scenarios created by modifying
tend to be sparse or at a coarse resolution. Hydrological models precipitation or temperature. We then train ML-models to
are often used to improve spatial coverage and resolution. address the following questions:
Furthermore, physics-based models allow us to go beyond past
observations, which is becoming more and more important • Can an ML model reproduce the soil moisture fields (2D)
as we face unprecedented climatic conditions (e.g., IPCC, and dynamics as simulated by a physics-based hydrological
2021) and can no longer rely only on past observations as model? If so, how accurately?
a reliable guide for the future. Improvements in computing • Can such a tool be used to predict soil moisture fields with
capabilities, and in particular, parallel computing, over lead times of up to 1 year?
the past decades (e.g., Kollet and Maxwell, 2008; Bierkens • Can we successfully combine physics-based modeling and
et al., 2015; Kurtz et al., 2016; Kuffour et al., 2019) have machine learning to predict efficiently the hydrological
enabled the use of these hydrological models even over large response to an unprecedented climate (i.e., different from
domains at high resolutions (e.g., Maxwell et al., 2015; O’Neill historical forcing)?
et al., 2021). However, running them multiple times (e.g.,
in sensitivity, uncertainty, or for future climate scenario
analysis) is still computationally challenging. At the same 2. Methods
time, Machine-Learning (ML) approaches are becoming more
widely used to address hydrological problems. ML models In this study, we focus on the headwater catchment
are generally very computationally efficient, at least once they Upper Taylor to study whether physics-based hydrological
FIGURE 1
In (A) the mask of the Upper Colorado River Basin and the Taylor river catchment used in this analysis, as well as the USGS gage 09110000
(Taylor River at Almont) used for the definition of the domain. In (B), the elevation map at 1 km resolution (resolution of the analysis), and in (C),
the three static inputs used in the training of the machine learning models which are consistent with the corresponding ParFlow-CLM inputs:
Slopes in x and y directions, surface porosity, and surface permeability.
modeling and machine learning can be successfully 2.2. Physics-based model: ParFlow-CLM
combined to predict 2D fields of surface soil moisture.
Here, we introduce the study area (Section 2.1) and the To generate the reference simulations, we use the integrated
different components: the physics-based hydrological hydrological model ParFlow. It simultaneously solves 3D
model (ParFlow-CLM, Section 2.2) and the 2D and 3D Richards’ equations in the subsurface and the 2D shallow water
convolutional neural network (Section 2.3), their respective equation (kinematic wave approximation) for surface flow. It
set-ups and workflows, as well as the different experiments is coupled with the Common Land Model (CLM), which is
carried out. responsible for simulating the land surface processes (i.e., water
and energy balance), as described in Maxwell and Miller (2005)
and Kollet and Maxwell (2008): CLM obtains the soil moisture
distribution over the top 4 soil layers from ParFlow as well as the
hydrological forcing and returns to ParFlow the net infiltration
2.1. Case study: Taylor, CO into the soil.
All the required input files, i.e., soil properties, landcover,
The chosen study area is the headwater catchment Taylor, and meteorological forcing, are a subset of those used for Upper
in the Upper Colorado River Basin (Figure 1). This catchment Colorado River Basin ParFlow-CLM simulations in Tran et al.
is at an elevation of 2,451–3,958 m and has a surface area (2020). The boundary conditions are set to no flow for all lateral
of ca. 1,144 km2 . It was defined by using the Taylor at domain edges as well as the bottom of the domain, while the
Almont USGS gage (id gage: 09110000) as the outlet. This overland flow is computed on the domain’s surface. The spatial
catchment is snowmelt dominated. The lowest average monthly resolution of the different inputs and the solver grid is 1 × 1 km,
discharges are recorded in January/February, with values of with 5 vertical layers of increasing thickness for a total depth of
ca. 3 m3 /s, after which there is a steady increase of discharge 102 m. The simulation is run with hourly timesteps for 36 water
and generally wetness in the catchment up until June when an years, from 1983 to 2018.
average discharge of ca. 25 m3 /s is recorded. The Taylors is an In addition to the historical simulations (Historical Forcing,
important mountain headwater system for flood control and HF), we also generate 12 synthetic drought scenarios. We
water supply. decrease historical precipitation by a random multiplicative
TABLE 1 The first 3 columns summarize the forcing scenarios run with ParFlow-CLM and used for the training and testing of the machine learning
models.
HF All
Scenario Temperature correction Precipitation correction Train Test Train Test
HF 0 1 x x
D1 0.5 0.8 x
D2 0.44 0.58 x
D3 0.51 0.63 x
D4 0.56 0.52 x
D5 0.62 0.83 x
D6 0.68 0.93 x
D7 0.45 0.81 x
D8 0.51 0.81 x
D9 0.25 0.9 x
D10 0.65 0.65 x
D11 0.8 0.65 x
D12 0.6 0.58 x x
The temperature correction is additive relative to historical forcing (HF) temperature, precipitation correction is a multiplicative factor relative to historical forcing. The forcing scenarios
are named Drought scenario 1–12 (D1-D12). The last 2 columns show which of the scenarios are used for the training and testing in the case in which only historical forcing is used for
training (HF), or also the additional forcing scenarios (all).
correction factor between 0.5 and 1 (Pscenario = cP ∗ available product such as remote sensing measurements. For
Phistoricalforcing ) and increase the historical temperature by an most of the results, unless otherwise specified, only the 10 driest
additive factor between 0 and 1◦ (Tscenario = Thistoricalforcing + years are considered.
cT ) for each of the historical water years. These corrections
are homogeneous in space and time and are, therefore,
the simplest way to have forcing which is still realistic (as 2.3. Convolutional neural networks
intermittency and spatial variability changes are kept consistent
with historical observations), but different from what was We use the simulations introduced in Section 2.2 to train
previously observed. The corrections are designed to mimic a and test two machine learning models designed to predict 2D
drought climate scenario (scenarios and respective correction soil moisture fields. We choose to use Convolutional Neural
factors are indicated in Table 1). All the other meteorological Network to take advantage of the strong spatial structures in
inputs required by ParFlow-CLM (radiation, specific humidity, the soil moisture fields and the different variables affecting
wind speed, and atmospheric pressure) are kept as in the its distribution. In fact, we choose the inputs to capture
corresponding historical water year. the main drivers of soil moisture temporal changes (net
ParFlow-CLM is run with all of the forcing scenarios for 10 infiltration and transpiration) and the spatial variable properties
water years, selected as the driest between 1983 and 2018 (lowest controlling water redistribution both vertically and laterally
annual precipitation and highest mean temperature, refer to (surface permeability and porosity, and slopes).
Figure 9). Additionally, the 12 additional forcing scenarios are Building upon the model exploration carried out in Maxwell
generated for each of those water years (i.e., 10 water years * et al. (2021), we select two CNNs: a 2D CNN consisting of
12 scenarios = 120 additional sets of forcing). In addition to 2 Convolution+ReLu layers, each followed by a Max Pooling
the input slopes, permeability, and porosity (Figure 1C), three layer, and two fully connected linear layers (refer to Table
ParFlow-CLM outputs are utilized here for the training and A3 in Maxwell et al., 2021), and a 3D CNN which has
testing of the ML models: soil infiltration (qflx_infl in ParFlow- the same architecture as the 2D model, but an additional
CM), vegetation transpiration (qflx_tran_veg in ParFlow-CM), Convolution+ReLu first layer (refer to Table A1 in Maxwell et al.,
and surface soil moisture. The latter is obtained by multiplying 2021).
the 2D fields of surface saturation simulated by ParFlow-CLM The inputs for both CNNs are porosity, permeability, slopes
by the surface porosity (top right in Figure 1C). We choose to and net infiltration, transpiration, and soil moisture. For 3D
consider surface soil moisture as it controls exchanges to the ParFlow-CLM variables (static variables and soil moisture), the
atmosphere, it is very dynamic, and it is comparable to the surface layer is considered. All dynamic variables are resampled
FIGURE 2
Workflow for the training and testing of the 2D (left) and 3D (right) Convolutional Neural Network. For the 2D architecture at every timestep, the
static and dynamic inputs as well as the soil moisture simulated by ParFlow-CLM are fed to the model, while the soil moisture of the next day is
provided as the label (or predicted in testing mode). This operation is repeated on all days of the year. For the 3D architecture, the 3rd dimension
is time, and all static inputs (including soil moisture at day 0) and dynamic ones are fed to the model, while the label is the 3D moisture field. 2D
(static) inputs are repeated for the 3rd (time) dimension.
to daily resolution, by averaging the values in the hours within the final results and performances. Unless otherwise stated,
each day. Furthermore, all inputs are scaled into 0:1 or –1:1 individual lines plotting one specific experiment in the results
ranges to facilitate training of the CNNs. represent the median among the ensemble of initialisations for a
The 2D CNN treats all timesteps (i.e., days in the year) as given architecture.
independent and uses the soil moisture field on the following First, we train both 2D and 3D CNNs with the ParFlow-CLM
day as the label (left panel in Figure 2). The 3D CNN is built simulations obtained with historical forcing. This allows us to
by considering time as the 3rd dimension. The 2D arrays of explore whether these models are capable of reproducing the soil
static inputs, as well as the day-one ParFlow-CLM surface soil moisture fields as simulated by ParFlow-CLM and compare the
moisture (initial conditions), are repeated in the 3rd dimension 2D and 3D CNNs. Then, we explore the potential of combining
to match the size of the dynamic inputs, of shape ny, nx, nt (with physics-based modeling and machine learning in the context of
y being South-North direction, x being West-East direction, and a changing climate by training the CNN either only on historical
t being time). The label for 3D CNN is the (ny, nx, nt) matrix of forcing simulations or both historical forcing and 11 of the
soil moisture for the water year. Both these models are trained on forcing scenarios (D1-D11 in Table 1) simulations, and testing
5 water years (1988, 1990, 2015, 2016, and 2018), and 3 different on the twelfth scenario (D12).
water years (2000, 2012, and 2013) are used for validation and To compare the performances over the testing year (2002),
early stopping. When the mean loss (smooth L1 loss) over the we use the Root Mean Square Difference (RMSD), the Nash-
last 100 epochs on the validation set is lower than the mean Sutcliff Efficiency (NSE), and the Kling-Gupta efficiency (KGE,
loss over the antecedent 100 epochs, the training is stopped. Gupta et al., 2009), computed as follows:
This is done to avoid over-fitting the machine learning model.
Every set-up and model configuration (i.e., model architecture v
u
u1 XN
or training data set) is repeated 10–15 times to verify the RMSD = t (PFi − MLi )2 (1)
impact that initialization (initial weights) of the CNNs has on N
i=1
FIGURE 3
In (A), the timeseries of ParFlow-CLM (PF) domain average in time for the water year 2002 (mean SMPF ) and the simulated net infiltration
(qflx_infl-qflx_tran_veg ParFlow-CLM variables). Below, the timeseries of Root Mean Square Difference, Nash-Sutcliff Efficiency, and
Kling-Gupta Efficiency comparing each day the PF soil moisture field to that of the 2D Convolutional Neural Network (CNN) when predicting 1
day to the next (1 Day) or the entire water year starting from day 0 (Recursive). The semi-transparent bands represent the 0.2–0.8 interquartile
ranges among repetition of the same ML configuration (i.e., different initialization). In (B), the temporal statistics comparing the timeseries at
each pixel of PF and the 2D CNN in the two configurations, as well as the persistent case in which PF soil moisture is assumed to remain
constant either over the entire water year (Persistent, t = 0, meaning that the soil moisture is assumed to remain as of 1st October of the chosen
water year) or 1 day to the next (Persistent, t-1).
and left column in Figure 3B). Overall the CNN model performs
well, with the RMSD typically lower than 0.025, and NSE
(PFi − MLi )2
PN
NSE = 1 − Pi=1
N
(2) and KGE higher than 0.9. Inspecting the temporal statistics
2
i=1 (PFi − µPF ) (Figure 3B, 2D 1 Day), we can see that the trained machine
learning model is performing well in most locations. NSE and
KGE = 1 − KGE seem to worsen at the river network, especially in upstream
s
2 2 2
locations (in the North-Western part of the domain, see the
Cov(PF, ML) σML µML
−1 + −1 + − 1 (3)
σPF σML σPF µPF mean soil moisture for the water year 2002 in Figure 7 to identify
the river network). Surprisingly, the performance of this CNN
where PF and ML are respectively the soil moisture as simulated model is worse than those obtained assuming persistence 1
by ParFlow-CLM and the machine learning model, µ is the day to the next, i.e., assuming that the soil moisture does not
mean, σ the standard deviation, and Cov the covariance. change (Persistent (t-1) in Figures 3A,B). The explanation for
All these statistics are computed both in space: i.e., the 2D this result is that the soil moisture changes are so small within
fields of PF and ML are compared at each timestep (i is the spatial most timesteps (changes over 1 day), that the CNN model
index, in x and y); and in time: i.e., the timeseries of PF and ML at overestimates them, leading to larger RMSD. Proof of this is in
every 1km×1km grid-cell within the catchment are compared (i the performances during the rainfall events in the later part of
is the temporal index). The first operation results in a timeseries the water year, when sharp peaks in net infiltration, mirrored by
of these statistical metrics while the latter is summarized in a sharp peaks in soil moisture, lead to better performances of the
map (one value for each domain grid-cell). CNN model than in the persistent case.
Models for soil moisture predictions are typically used to
3. Results simulate over time periods of weeks or months, rather than
just the following day. We test the applicability of the CNN
3.1. 2D convolutional neural network model to longer-running simulations. We use the same trained
2D CNN model, but carry out the testing in a different way:
We train the 2D CNN with historical forcing simulations instead of using the soil moisture field at t simulated by ParFlow-
and test it on the water year 2002 (1 Day: blue line in Figure 3A CLM to make the t + 1 prediction, we use the CNN output
FIGURE 5
In (A) the timeseries of ParFlow-CLM (PF) domain average in time for water year 2002 (mean SMPF ) and the simulated net infiltration
(qflx_infl-qflx_tran_veg ParFlow-CLM variables). Below the timeseries of Root Mean Square Difference, Nash-Sutcliff Efficiency, and Kling-Gupta
Efficiency comparing the PF mean to that of the 2D Convolutional Neural Network (CNN) (2D) or the 3D CNN (3D). The semi-transparent bands
represent the 0.2–0.8 interquantile ranges among repetition of the same ML configuration (i.e., different initialization). In (B) the temporal
statistics comparing the timeseries at each pixel of PF and the 2D and 3D CNNs.
FIGURE 7
Soil moisture timeseries at the 4 locations (A, B, C, and D) identified in the map of average Parflow-CLM soil moisture for the water year 2002.
These are arbitrary locations selected to show the model behavior at different locations (e.g., B is on the river network, D is upstream, A is further
from the river network, and C is next to an upstream contributor). The timeseries of ParFlow-CLM (continuous black line) are compared to the
estimates with a 2D convolutional neural network (2D) and a 3D convolutional neural network. The semi-transparent lines represent the
different ML models trained (different weight initialization), while the darker lines show the median of those lines for the 2D and 3D models. The
corresponding performances are reported in the Table. The dashed line represents the long term mean timeseries of soil moisture simulated by
ParFlow-CLM at each location, computed over the 36 years available (i.e., for each day, the mean over the 36 years for that day).
events. This explains the superior performances of the 3D CNN the same architecture but trained either only on historical
in the snowmelt season. It also explains the earlier peak for forcing ParFlow-CLM simulations, or also on those with altered
some of the locations, where soil moisture responds to positive forcing (11 scenarios, mimicking drought scenarios by randomly
infiltration with a delay not present in training years, and the decreasing precipitation and increasing temperature).
more smoothed behavior toward the end of the water year, where Model performances with the additional training scenarios
the 2D CNN responds better to strong rainfall events. are superior both in time and space (Figure 8). In fact, the
RMSD is lower, and NSE and KGE are higher over the entire
water year when training not only on historical forcing and
3.3. Forcing scenarios the performances are also improved over a large portion of
the domain. The 3D CNN model trained only on historical
Finally, we test the potential of combining physics-based forcing performs worse on the testing scenario used here (D12
modeling and machine learning in the context of a changing in Table 1) rather than on the historical forcing, both for the
climate by looking at the performances when predicting a water year 2002 (comparing the yellow line in Figures 5, 8).
climate outside the range of that observed in training. We This means that indeed when experiencing a climate outside
focus on the 3D CNN due to its superior performances but the range of the training data, the performances worsen. The
the same results could also be observed also with the 2D same is not true if the CNN model is also exposed to drought
CNN model. We compare two sets of 3D CNN models, with scenario simulations in the training. In fact, the performances
FIGURE 8
In (A), the timeseries of ParFlow-CLM (PF) domain average in time for the water year 2002 and the testing forcing scenario (mean SMPF ) and the
simulated net infiltration (qflx_infl-qflx_tran_veg ParFlow-CLM variables). Below, the timeseries of Root Mean Square Difference, Nash-Sutcliff
Efficiency, and Kling-Gupta Efficiency comparing the PF mean to that of the 3D Convolutional Neural Network (CNN) trained only on the
historical forcing simulations (Historical Forcing, HF) or also on the additional forcing scenarios (all). The semi-transparent bands represent the
0.2-0.8 interquartile ranges among repetitions of the same ML configuration (i.e., different initialization). In (B), the temporal statistics comparing
the timeseries at each pixel of PF and the 3D CNN trained just on historical forcing simulations or also on those with additional forcing scenarios.
are better in the testing even if the specific testing scenario is not generated using forcing altered to resemble expectations of
used in training (training is done with scenarios D1-D11 and the future climate. With this training set, the CNN model
historical forcing). not only learns how to emulate ParFlow-CLM response to
forcing but also how changes in forcing manifest in the soil
moisture response. In fact, the model has learned what the
4. Discussion effect of an increase in temperature and decrease in precipitation
have on soil moisture. The CNN model can then be used
In this work, we explore how machine learning and physics- to better predict the soil moisture response to climate which
based hydrological modeling can be successfully combined has similar properties (in this case, reduced precipitation and
to predict efficiently 2D moisture fields. The former being increased temperature), but it is different from what was seen in
strongly dependent on the training/input data quality and training. This tool can be used to explore many different possible
quantity but extremely computationally efficient makes it very scenarios, in the context of an uncertain future, which would
compatible with a physics-based model which is informative be computationally unfeasible with ParFlow-CLM directly. It
but computationally expensive. For the case study presented is interesting to notice that the two models (trained on just
here, ParFlow-CLM runs one water year simulation in ca. 25 historical forcing or also on the additional forcing scenarios)
min when using 9 CPUS. We demonstrate how a CNN can be perform very similarly if tested on a validation year (i.e., water
trained by using the simulations generated with ParFlow-CLM year that has not been used in training) but with historical
and reproducing its soil moisture estimates. Making a 1 water forcing (refer to Supplementary Figure S2). This confirms that
year prediction with the trained ML models takes few seconds indeed exposing the model to a “different climate” is not simply
on one CPU. As a reference, the performances of a very simple improving the model because it is trained on more data (5 water
water balance model based on the same inputs as the CNN, years * 11 forcing scenarios = 55 extra water years to train on),
are much worse than the ones obtained here with the machine but it’s also learning the impact of increasing temperature and
learning models (RMSD>0.2 and negative NSE and KGE). decreasing precipitation. In the testing scenario (D12) which
Having an informative physics-based model, allows us to neither model has seen in training, the superiority of the one
go beyond past observations and create simulations that are that has seen altered forcing is evident.
FIGURE 9
The mean root mean square error (RMSD, first row of plots), Nash-Sutcliffe Efficiency (NSE, middle row of plots), and Klige-Gupta Efficiency
(KGE, lower row of plots) as a function of the total precipitation (left column of plots) or the mean hourly temperature (right column of plots)
computed for each water year. The color identifies water years used in training, validation, or testing. The water year 2002, used for all the
results reported here is marked blue. Black lines represent the trends of a linear fit.
One of the biggest limitations of this study is the presented here, we look at the performances of the trained 3D
choice of forcing scenarios. These are chosen to reproduce a CNN when we predict all 36 water years available (Figure 9,
plausible (intra-annual variability) field of forcing variables with refer to Supplementary Figure S1). While a lot of scattering can
altered statistics (changing the precipitation and temperature be observed the trends are clear across all statistical metrics:
homogeneously in space and time). This choice allows for the performances worsen with increasing total precipitation
a controlled experiment, where only a small perturbation is and decreasing mean hourly temperature. This confirms once
applied, but the resulting scenarios are not actually realistic again the importance of the training data quality and specifically
as future climates. More complex changes are expected in the whether they are representative (ergodicity). Similar conclusions
future, with temporally and spatially heterogeneous impacts on can be drawn by looking at the comparison of the performances
precipitation and temperature, but also the other considered using the 3D CNN or the long term mean (LTM) timeseries
meteorological inputs (e.g., Trenberth, 2005). This could (Figure 10). The 3D CNN is outperforming the LTM for drier
be considered in future work, by using weather generators years (lower yearly precipitation) but is performing worse on
(e.g., Semenov and Barrow, 1997; Kilsby et al., 2007; Peleg wetter ones. Where CNN is really outperforming is on the
et al., 2017), which produce realistic fields of the different scenarios, which are much drier than the "average year". These
meteorological variables, provided (some) statistical properties. conclusions are consistent with the out-of-sample testing results
Using more realistic future climate scenarios, that are not just in Maxwell et al. (2021). When developing a framework such
homogeneously modified, would probably harden the training as the one presented here, it is important not only to ensure
of the CNNs to the impact that changing the climate has that enough training data are generated with the physics-based
on soil moisture and require more training simulations with model, but also that those are tailored to the specific application
altered forcing. of interest and intended use.
The same conclusions of the scenarios experiment are also The benefits of the combination of physics based modeling
true for wetter or drier water years. The focus here is on droughts and machine learning have been shown here in the context of a
and therefore we select drier years (higher mean temperature changing climate but they extend to other potential applications
and lower mean precipitation) between 1983 and 2018. If the which require a large number of simulations such as improving
trained model is used to predict wetter years, the performances physics based models parametrisation (refer to e.g., Maxwell
worsen. To show this, while it was not the purpose of the study et al., 2021).
References
Bierkens, M. F. P., Bell, V. A., Burek, P., Chaney, N., Condon, L. E., David, C. H., monitoring: assessment of the smos derived soil water deficit index. Remote Sens.
et al. (2015). Hyper-resolution global hydrological modelling: what is next? Hydrol. Environ. 177, 277–286. doi: 10.1016/j.rse.2016.02.064
Process 29, 310–320. doi: 10.1002/hyp.10391
Massari, C., Brocca, L., Barbetta, S., Papathanasiou, C., Mimikou, M., and
Bogaard, T. A., and Greco, R. (2018). Invited perspectives: Hydrological Moramarco, T. (2014). Using globally available soil moisture indicators for flood
perspectives on precipitation intensity-duration thresholds for landslide initiation: modelling in mediterranean catchments. Hydrol. Earth Syst. Sci. 18, 839–853.
proposing hydro-meteorological thresholds. Natural Hazards Earth Syst. Sci. 18, doi: 10.5194/hess-18-839-2014
31–39. doi: 10.5194/nhess-18-31-2018
Maxwell, R., Condon, L., and Melchior, P. (2021). A physics-informed, machine
Bolten, J. D., Crow, W. T., Jackson, T. J., Zhan, X., and Reynolds, C. A. (2010). learning emulator of a 2d surface water model: What temporal networks and
Evaluating the utility of remotely sensed soil moisture retrievals for operational simulation-based inference can help us learn about hydrologic processes. Water
agricultural drought monitoring. IEEE J. Select. Top. Appl. Earth Observat. Remote 13, 3633. doi: 10.3390/w13243633
Sens. 3, 57–66. doi: 10.1109/JSTARS.2009.2037163
Maxwell, R. M., Condon, L. E., and Kollet, S. J. (2015). A high-resolution
Chen, L., Cao, Y., Ma, L., and Zhang, J. (2020). A deep learning-based simulation of groundwater and surface water over most of the continental us with
methodology for precipitation nowcasting with radar. Earth Space Sci. 7, the integrated hydrologic model parflow v3. Geoscientific Model Dev. 8, 923–937.
e2019EA000812. doi: 10.1029/2019EA000812 doi: 10.5194/gmd-8-923-2015
Dobriyal, P., Qureshi, A., Badola, R., and Hussain, S. A. (2012). A Maxwell, R. M., and Miller, N. L. (2005). Development of a coupled land surface
review of the methods available for estimating soil moisture and its and groundwater model. J. Hydrometeorol. 6, 233–247. doi: 10.1175/JHM422.1
implications for water resource management. J. Hydrol. 458–459, 110–117.
Merz, B., and Plate, E. J. (1997). An analysis of the effects of spatial
doi: 10.1016/j.jhydrol.2012.06.021
variability of soil and soil moisture on runoff. Water Resour. Res. 33, 2909–2922.
ElSaadani, M., Habib, E., Abdelhameed, A. M., and Bayoumi, M. (2021). doi: 10.1029/97WR02204
Assessment of a spatiotemporal deep learning approach for soil moisture
Mirus, B. B., Morphew, M. D., and Smith, J. B. (2018). Developing hydro-
prediction and filling the gaps in between soil moisture observations. Front. Artif.
meteorological thresholds for shallow landslide initiation and early warning. Water
Intell. 4, 636234. doi: 10.3389/frai.2021.636234
10, 1274. doi: 10.3390/w10091274
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F. (2009).
Narasimhan, B., and Srinivasan, R. (2005). Development and evaluation
Decomposition of the mean squared error and nse performance criteria:
of soil moisture deficit index (smdi) and evapotranspiration deficit index
Implications for improving hydrological modelling. J. Hydrol. 377, 80–91.
(etdi) for agricultural drought monitoring. Agric. Forest Meteorol. 133, 69–88.
doi: 10.1016/j.jhydrol.2009.08.003
doi: 10.1016/j.agrformet.2005.07.012
IPCC (2021). IPCC Press Release, 9 Aug. 2021. IPCC Press Conference.
Norbiato, D., Borga, M., Degli Esposti, S., Gaume, E., and Anquetin, S.
Available online at: https://www.ipcc.ch/site/assets/uploads/2021/08/IPCC_WGI-
(2008). Flash flood warning based on rainfall thresholds and soil moisture
AR6-Press-Release_en.pdf (accessed 22 March, 2022).
conditions: an assessment for gauged and ungauged basins. J. Hydrol. 362, 274–290.
Kilsby, C., Jones, P., Burton, A., Ford, A., Fowler, H., Harpham, C., et al. doi: 10.1016/j.jhydrol.2008.08.023
(2007). A daily weather generator for use in climate change studies. Environ. Model.
Ochsner, T. E., Cosh, M. H., Cuenca, R. H., Dorigo, W. A., Draper, C. S.,
Software 22, 1705–1719. doi: 10.1016/j.envsoft.2007.02.005
Hagimoto, Y., et al. (2013). State of the art in large-scale soil moisture monitoring.
Kollet, S. J., and Maxwell, R. M. (2008), Capturing the influence of groundwater Soil Sci. Soc. Am. J. 77, 1888–1919. doi: 10.2136/sssaj2013.03.0093
dynamics on land surface processes using an integrated, distributed watershed
O’Neill, M. M. F., Tijerina, D. T., Condon, L. E., and Maxwell, R. M. (2021).
model. Water Resour. Res. 44, W02402. doi: 10.1029/2007WR006004
Assessment of the parflow-clm conus 1.0 integrated hydrologic model: evaluation
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, of hyper-resolution water balance components across the contiguous united states.
G. (2019). Towards learning universal, regional, and local hydrological behaviors Geoscientific Model Dev. 14, 7223–7254. doi: 10.5194/gmd-14-7223-2021
via machine learning applied to large-sample datasets. Hydrol. Earth Syst. Sci. 23,
Pauwels, V. R. N., Hoeben, R., Verhoest, N. E. C., and Ëois, P., De Troch
5089–5110. doi: 10.5194/hess-23-5089-2019
(2001). The importance of the spatial patterns of remotely sensed soil moisture
Kuffour, B. N. O., Engdahl, N. B., Woodward, C. S., Condon, L. E., Kollet, in the improvement of discharge predictions for small-scale basins through data
S., and Maxwell, R. M. (2019). Simulating coupled surface-subsurface flows with assimilation. J. Hydrol. 251, 88–102. doi: 10.1016/S0022-1694(01)00440-1
ParFlow v3.5.0: capabilities, applications, and ongoing development of an open-
Peleg, N., Fatichi, S., Paschalis, A., Molnar, P., and Burlando, P. (2017). An
source, massively parallel, integrated hydrologic model. Geoscientific Model Dev.
advanced stochastic weather generator for simulating 2-d high-resolution climate
Discuss. 2019, 1–66. doi: 10.5194/gmd-2019-190
variables. J. Adv. Model. Earth Syst. 9, 1595–1627. doi: 10.1002/2016MS000854
Kurtz, W., He, G., Kollet, S. J., Maxwell, R. M., Vereecken, H., and
Hendricks Franssen, H.-J. (2016). TerrSysMP-PDAF (version 1.0): a Semenov, M. A., and Barrow, E. M. (1997). Use of a stochastic weather generator
modular high-performance data assimilation framework for an integrated in the development of climate change scenarios. Clim. Change 35, 397–414.
land surface-subsurface model. Geoscientific Model Dev. 9, 1341–1360. doi: 10.1023/A:1005342632279
doi: 10.5194/gmd-9-1341-2016 Seneviratne, S. I., Corti, T., Davin, E. L., Hirschi, M., Jaeger, E. B., Lehner, I., et
Lange, H., and Sippel, S. (2020). “Machine learning applications in hydrology,” al. (2010). Investigating soil moisture-climate interactions in a changing climate: a
in Forest-Water Interactions (Cham: Springer International Publishing), 233–257. review. Earth Sci. Rev. 99, 125–161. doi: 10.1016/j.earscirev.2010.02.004
Leonarduzzi, E., McArdell, B. W., and Molnar, P. (2021). Rainfall- Sharma, P. K., Kumar, D., Srivastava, H. S., and Patel, P. (2018). Assessment
induced shallow landslides and soil wetness: comparison of physically of different methods for soil moisture estimation: a review. J. Remote Sens. GIS 9,
based and probabilistic predictions. Hydrol. Earth Syst. Sci. 25, 5937–5950. 57–73. doi: 10.37591/.v9i1.105
doi: 10.5194/hess-25-5937-2021
Shen, C., Chen, X., and Laloy, E. (2021). Editorial: Broadening
Martínez-Fernández, J., González-Zamora, A., Sánchez, N., Gumuzzio, A., and the use of machine learning in hydrology. Front. Water 3, 681023.
Herrero-Jiménez, C. M. (2016). Satellite soil moisture for agricultural drought doi: 10.3389/frwa.2021.681023
Shi, X., Chen, Z., Wang, H., Yeung, D. Y., Wong, W. K., and Woo, W. Tran, H., Zhang, J., Cohard, J.-M., Condon, L. E., and Maxwell,
C. (2008). “Convolutional LSTM network: A machine learning approach for R. M. (2020). Simulating groundwater-streamflow connections in the
precipitation nowcasting,” in Proceedings of the 28th International Conference on upper colorado river basin. Groundwater 58, 392–405. doi: 10.1111/gwat.
Neural Information Processing Systems (Montreal, QC), 802–810. 13000
Su, A., Li, H., Cui, L., and Chen, Y. (2020). A convection nowcasting method Trenberth, K. E. (2005). “The impact of climate change and variability on heavy
based on machine learning. Adv. Meteorol. 2020, 5124274. doi: 10.1155/2020/51 precipitation, floods, and droughts,” in Encyclopedia of Hydrological Sciences, ed M.
24274 G. Anderson (Hoboken, NJ: Wiley). doi: 10.1002/0470848944.hsa211
Tran, H., Leonarduzzi, E., De la Fuente, L., Hull, R. B., Bansal, V., Weyn, J. A., Durran, D. R., and Caruana, R. (2019). Can machines learn to
Chennault, C., et al. (2021). Development of a deep learning emulator for predict weather? using deep learning to predict gridded 500-hpa geopotential
a distributed groundwater-surface water model: Parflow-ml. Water 13, 3393. height from historical weather data. J. Adv. Model. Earth Syst. 11, 2680–2693.
doi: 10.3390/w13233393 doi: 10.1029/2019MS001705