0% found this document useful (0 votes)
30 views14 pages

Large-Scale Seasonal Forecasts of River Discharge by Coupling Local and Global Datasets With A Stacked Neural Network Case For The Loire River System

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views14 pages

Large-Scale Seasonal Forecasts of River Discharge by Coupling Local and Global Datasets With A Stacked Neural Network Case For The Loire River System

Uploaded by

maitphang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Science of the Total Environment 897 (2023) 165494

Contents lists available at ScienceDirect

Science of the Total Environment


journal homepage: www.elsevier.com/locate/scitotenv

Large-scale seasonal forecasts of river discharge by coupling local and global


datasets with a stacked neural network: Case for the Loire River system

M.T. Vu a, , A. Jardani a, M. Krimissa b, F. Zaoui b, N. Massei a
a
Université de Rouen, M2C, UMR 6143, CNRS, Morphodynamique Continentale et Côtière, Mont Saint Aignan, France
b
Electricité de France EDF, Le Département Laboratoire National d'Hydraulique et Environnement (LNHE), 6 Quai Watier, Chatou, France

H I G H L I G H T S G R A P H I C A L A B S T R A C T

• Multi-step-ahead forecasting with -scale physics Processing

stacked-LSTM networks for a deeper inter-


pretation
• Couple local and global datasets to better Piezometers
Data pre-processing 1 month ahead
consider water cycle behavior
• Perform frequency and lag analysis of
Data analysis:
time series to extract meaningful support Local 3 months ahead

• Long-term forecasting with time horizon Global

up to six months most relevant


:
6 months ahead
• Mutual effects between the multiple series

datasets and multiple time-step responses Stacked neural


networks

A R T I C L E I N F O A B S T R A C T

Editor: Jay Gan Accurate prediction of river discharge is critical for a wide range of sectors, from human activities to environmental
hazard management, especially in the face of increasing demand for water resources and climate change. To address
Keywords: this need, a multivariate model that incorporates both local and global data sources, including river and piezometer
Forecast
gauges, sea level, and climate parameters. By employing phase shift analysis, the model optimizes correlations be-
River discharge
tween the target discharge and 12 parameters related to hydrologic and climatic systems, all sampled daily. In addi-
Stacked LSTM
Deep learning
tion, a stacked LSTM - a more complex neural network architecture - is used to improve information extraction ability.
Big data Exploring river dynamics in the Loire-Bretagne basin and its surroundings, the investigation delves into predictions in
Loire Bretagne Basin daily time steps for one, three, and six months ahead. The resulting forecast features high accuracy and efficiency in
predicting river discharge fluctuations, showcasing superior performance in forecasting drought periods over flood
peaks. A detailed examination on data used highlights the significance of both local and global datasets in predicting
river discharge, where the former dictates short-term predictions, while the latter drives long-range forecasts. Season-
ally extended forecasting confirms a strong connection between the forecast leading time and the shift in data corre-
lation, with lower correlation at a lag of 3 months due to seasonal changes affecting forecast quality, compensated
by a higher correlation at a longer lag of 6 months. Such mutual effect in this multi-time-step forecasting improves
the predictive quality of a six-month horizon, thus encourages progress in long-term prediction to a seasonal scale.
The research establishes a practical foundation for effectively utilizing big data to leverage long-term forecasting of en-
vironmental dynamics.

⁎ Corresponding author.
E-mail addresses: [email protected] (M.T. Vu), [email protected] (A. Jardani), [email protected] (M. Krimissa), [email protected] (F. Zaoui),
[email protected] (N. Massei).

http://dx.doi.org/10.1016/j.scitotenv.2023.165494
Received 26 May 2023; Received in revised form 7 July 2023; Accepted 10 July 2023
Available online 13 July 2023
0048-9697/© 2023 Elsevier B.V. All rights reserved.
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

1. Introduction Ocean (Massei et al., 2010), in both cases related to annual cycles. These re-
sults have encouraged researchers to further exploit large-scale climate var-
Accurately predicting streamflow is an important task that impacts iables as predictors of long-term water system conditions (Ouma et al.,
many areas, from agriculture and recreation management to industrial pur- 2022; Dikshit et al., 2021; Martin Santos et al., 2021). However, the con-
poses and environmental disaster management. Very often, predictive cept is still little known in the literature, mainly due to the difficulty of col-
models are designed to prevent extreme events such as floods and droughts lecting such a large dataset from all over the world, of which the analysis
which can have dramatic consequences for the environment and human life hence requires a high comprehension and computational effort.
(Mosavi et al., 2018; Sit et al., 2020). In this field, many forecasting models The deluge of data in the Earth system has reached unprecedented
have been developed over the last decade using various physics-based or levels, with storage volumes exceeding tens of petabytes (Shen, 2018),
black-box approaches to help public agencies better anticipate the occur- and our ability to make sense of this data has not kept pace with our ability
rence of such events at different short- and long-term time horizons. How- to generate and gather it. The first major challenge, therefore, concerns the
ever, in the context of climate change, the agencies increasingly need to data processing: how to extract interpretable information and knowledge
elaborate water management plans and environment protections toward from this massive amount of data and integrate it across disciplines to
long-term forecasts of the basin state that can span over months, seasons, build interpretable models. A common solution is decomposition for deal-
and even years (Karpatne et al., 2019). As the forecasting horizon ing with non-linearity and non-stationarity in time series, e.g., wavelet
lengthens, more data are needed to capture generalized phenomena and transform (WT), empirical mode decomposition (EMD), and empirical en-
processes of different types at different scales that control basin functional- semble mode decomposition (EEMD) (Gürsoy and Engin, 2019; Niu et al.,
ity, especially with respect to drought events. This context naturally leads to 2019; Zhang et al., 2018). The combination of data preprocessing tech-
larger data sets required to build a powerful predictive model that can han- niques and deep learning tools can surpass the performance of traditional
dle increasingly complex data (Shen, 2018). models. Nevertheless, a challenge associated with these models is the com-
Over the past decade, data-driven models have revolutionized the field putationally intensive process of identifying the optimal frequency compo-
of water dynamic forecasting with their ability to efficiently approximate nents for each signal. Implementing these techniques on a large data set
nonlinear relationships between the river discharge and the meteorologi- containing millions of time series can be daunting. In this study, we propose
cal, geological, and anthropogenic factors that control it. Among these a simplified variant in which a single frequency is determined by resam-
models, artificial neural networks (ANNs) have emerged since the 1960s pling the time series of the input data, where the step size in resampling
and now represent an attractive alternative for both short- and long-term is chosen based on its ability to optimally reproduce the flow signal in the
forecasting (Dramsch, 2020; Shen, 2018). In recent years, new generations network output.
of neural networks have emerged, of which the mechanisms are adapted to The second challenge concerns the neural network employed: More
process time series by efficiently capturing temporal dependencies while data requires a more complex model to translate the processed data,
preserving the memory effect, such as the long short-term memory -LSTM which often leads to higher-level neural architectures. Several studies
(Hochreiter and Schmidhuber, 1997). LSTM has been applied in various have highlighted the exceptional ability of LSTMs to replicate streamflow
ways to improve streamflow forecasting, such as using past streamflow fluctuations in catchments worldwide and to estimate the associated uncer-
data at the same location as an autoregression to predict future fluctuations tainty, e.g., (Hunt et al., 2022; Mehedi et al., 2022; Natel de Moura et al.,
(Zhu et al., 2020; Sahoo et al., 2019; Mehedi et al., 2022). It has also been 2022). Benchmarking on 516 basins across the continental United States,
used with the incorporation of multiple data, such as flow data from nearby Gauch et al. (2021) further discovered that LSTMs trained on reanalysis
streams and metrological data (Nguyen et al., 2022; Fang and Shao, 2022; data can predict streamflow at any given temporal resolution, even at mul-
Vu et al., 2023). Currently, the latter way, whose hydrologic prediction is tiple resolutions simultaneously. Training a single LSTM across multiple ba-
based on the extraction of different features from numerous data sources sins and incorporating data on basin geography outperformed individual
(e.g., air temperature, precipitation, and potential evapotranspiration), LSTMs trained on a per-basin basis, resulting in superior performance
promises the most reliable long-term forecasts (Li and Yuan, 2023; even for ungauged basins. Surprisingly, simple LSTM architectures without
Girihagama et al., 2022; Xiang et al., 2020; Natel de Moura et al., 2022). De- stacking were performed as well as stacked ones in simulating discharge
spite the abundance of research in the field, the precise functional distribu- fluctuations (Le et al., 2021; Chidepudi et al., 2023). Although a single
tion of individual datasets, at both local and large scales, remains poorly LSTM can predict a few time steps ahead, it becomes less effective for
understood. This lack of understanding presents a major obstacle for prac- long-term forecasting with multiple time horizons, making stacked neural
titioners, as it impairs their ability to accurately predict short- and long- networks more prevalent (Chandra et al., 2021; Wang and Zhang, 2020).
term outcomes using treated models. By incorporating multiple LSTM layers, more advanced neural architecture
In most cases, the hydraulic and climatic data processed are collected can extract intricate feature embeddings from diverse databases, resulting
only on the territory of the basin studied, which limits the possibility of pro- in longer and more precise outputs, particularly for multiple time steps.
viding a large database to build an effective predictive model to grasp the Existing research has predominantly concentrated on predicting individual
seasonal hydraulic behavior of the basin. This territorial compartmentaliza- lead times, limiting our comprehension of the interrelationships between
tion in the construction of predictive models has been imposed in hydrolog- different time step horizons. However, the primary objective of this present
ical studies as a real physical fact, but in practice it is a misconception the study is to overcome this limitation and explore the mutual influence of
model simplification, since involving hydraulic and climatic continuity forecasts across multiple time steps.
does not obey this compartmentalization. This means that regional observa- This study presents a number of novelties including: i) Combining data
tions, such as precipitation, temperature, or atmospheric pressure, obtained sources from different scales and sources to achieve more complete cover-
on a large continental or global scale can provide relevant information on age of input data, which improves the interpretability of forecasts. ii) Pro-
the key climatic drivers of seasonal basin dynamics (Murphy et al., 2010; posing an efficient technique for preprocessing millions of time series by
Karpatne et al., 2019). The use of local data alone thus entails more of a pro- choosing the optimal sampling frequency for seasonal forecasts. iii) Using
jection from climate variables than a prediction from the true driving a stacked LSTM architecture for seasonal horizon forecasts that evaluates
forces. In addition, identifying atmospheric circulation patterns or the mutual effects between the multiple datasets and also between multiple
teleconnections that cause extreme events has been an important step in time-step responses. The investigation delves deeply into the unique impor-
predicting catchment flow behavior (Hagen et al., 2021; Murphy et al., tance of individual datasets in driving predictive performance, as well as
2010; Massei et al., 2010). For example, in southeastern Australia, climate the interdependent impact on multi time-step forecasting, which have
forcings influencing droughts originate from atmospheric circulation pat- been relatively unexplored in practical research. Outcomes of this study
terns emanating from the Pacific, Indian, and Southern Oceans (Dikshit aim to establish a foundation for exploiting big data to leverage the sea-
et al., 2021), while in western France they originate from the Atlantic sonal forecasting of hydrodynamics, which can also be extended to other

2
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

environmental dynamics exhibiting similar fluctuation patterns. The pro- the problem of over-extraction that is more serious than elsewhere in
posed method is thus applied to forecast the river discharge in the Loire- France (Rinaudo et al., 2020). The rivers themselves also serve other local
Bretagne basin, the largest catchment in France, which is strongly affected services, however the total water use of the catchment is relatively small
by climate change and has experienced severe droughts in recent years. The compared to the discharge of all rivers. Due to lack of the water use scheme
research result should facilitate the development of an informed water and in detail, we thus consider only the natural variables in this study.
environmental management plan in the basin, with potential applications
in other regions of the world. 2.2. Data acquisition

2. Study area and data acquisition The objective of the data collection strategy in this study is to relate re-
gional characteristics contained in local hydrologic observations to the cli-
2.1. Study area matic cycle at a larger scale, including river discharge, piezometer, sea
level, air temperature, atmospheric pressure, precipitation, soil moisture,
Loire-Bretagne basin is the largest basin in France, with a total area of relative humidity, and evaporation rate. Such a strategy aims to provide
155,000 km2, representing 28 % of the French territory (https://agence. as much data as possible for the predictive model at various scales to arrive
eau-loire-bretagne.fr/). The hydrogeographic basin includes the watershed at a better understanding of the functioning of the water cycle as a whole
of Loire River and its tributaries, the coastal basins of the Vilaine, Bretagne system. Time series variables are collected from relevant sources that fall
and Vendee, and the marshes of the Poitevin. The main channel – the Loire into the following categories: a) gauging observations; b) hydrogeo-
River, is more than 1000 km long, and its entire hydrographic network meteorological datasets (see Table S1.1 in Supplementary 1). In which,
covers more than 135,000 km. The basin, with its extensive water system, the former are punctual ground-based measurements while the latter are in-
is connected to a complex water cycle on both regional and global scales. terpolated grid analyses.
The basin, with a coastline of 6654 km, is first directly under the influ- The punctual ground gauges measure discharges in rivers, water levels
ence of the Atlantic Ocean forcing, into which it flows with a mean annual in aquifers throughout the watershed, and tides on ocean coasts. In the riv-
discharge of 850 m3/s at its mouth. The maximum mean monthly flow is ers, the monitoring is recovered from the HYDRO database including the
about 1630 m3/s in February and the minimum flow is 257 m3/s in August discharge at 1614 stations over a long span from 1900 to 2022 in daily fre-
(Petelet-Giraud et al., 2018), with a strong contrast between the dry and quency Groundwater levels are extracted from the ADES database with re-
wet seasons. Similar conditions prevail in other rivers in the study area, cords at about 1700 piezometers from the year 1900 to 2022 (see Fig. S1.1).
with seasonal differences of tens to hundreds of times; some flows are Sea levels are collected from REFMAR database which provide a total of
even interrupted for days or weeks during dry summers. The cause for 368 active stations in the oceans, focusing on the coastlines of France and
this is closely tied to the climatic contrasts between the dry summer and the United States, with up-to-date records from the year 2000 to 2022. In-
the heavy winter rains. deed, the motivation to consider tides in the dataset lies not in the connec-
The basin is characterized by heterogeneous river drainage between its tion between rivers and seas in the tidal zone at river mouths, but in the
upper and lower reaches and between the main river and its tributaries. importance of seawater in the global water cycle. Storing 23 times the
Heterogeneity also exists in the climatic regime, which dominates in the up- water on land and millions of times the water in the atmosphere, the oceans
stream and downstream areas, where precipitation gradually increases to- buffer fluxes many times greater than the terrestrial equivalent. Ocean dy-
ward the coast (downstream). The Loire discharge then increases along namics must therefore be central to the water cycle picture of interactive re-
the river to the mouth, with an important contribution of the tributaries, es- lationship between ocean, atmosphere, and land, which has received little
pecially in the upper and lower parts of the Loire. The discharge in the cen- attention in forecasting models recently. The gridded datasets in this
tral part is rather related to the connection with the groundwater, the study are gathered from the NOAA reanalysis with daily calibration from
Beauce aquifer. One of these main aquifers extends into semi-permeable 1948 to 2022, including air pressure, air temperature, precipitation rate,
sedimentary rocks with numerous underground inflows into the northern volumetric soil moisture, relative humidity, and potential evaporation
basin. These inflows, however, represent only a few percent of the total rate. Spatial coverages are 2 × 2 and 2.5 × 2.5 degree in global grids
river flow (Baratelli et al., 2016). This therefore highlights an important (Kalnay et al., 1996), of which the parameter properties are given in
characteristic of this basin: a weak connection between surface water and Table S1.1 and the corresponding maps are exampled in Fig. S1.2 with
groundwater at the watershed scale. other details in Supplementary 1. Finally, the overall datasets consist of
The basin is characterized by an average altitude of 300 m and mostly around 1.5 million time series with different physical backgrounds and fea-
low valleys. Only slightly less than 10 % of the basin's area is above tures.
800 m with a maximum altitude of 1500 m (Beaufort et al., 2020), which Note that not all data refer to water, but to temperature and pressure,
explains why snow is rare and most of the natural supply comes from pre- which are the key parameters controlling the behavior of the water cycle
cipitation. Furthermore, precipitation itself varies from year to year be- both on the continents and in the oceans. With such a heterogeneous data-
tween 550 and 2100 mm/year, and this climatic alternation of wet and base, we are in fact working with such a complex multiphysical phenome-
dry years determines the long-term hydrologic functioning of the basin. non as the global water cycle. This big data can provide better insight
The interannual variation of river discharge is highly related to the North into global climate dynamics, but also poses a challenge for the data pro-
Atlantic Oscillation (NAO), with interannual modes of 17 and 5–7 years cessing to extract meaningful information to feed the forecasting model,
(Massei et al., 2010). This implies that the hydrologic functioning of the which is the subject of the next subsection. The database can indeed be ex-
basin may be under the influence of a larger and longer climate regime panded to include other variables, such as solar radiation, cloud cover, or
(Murphy et al., 2010). heat flux. As part of the information from these variables may already be
In terms of water use, the main river - the Loire, has often been referred embedded in the using data, upscaling the problem only leads to a minimal
to as “the last wild river” in Western Europe, due to the relative absence of improvement in accuracy, but no further advances in methodology.
large dams and the effective protection of natural conditions along the river
(http://www.uicn.org). Although some 35,000 ha of farmland are irri- 3. Data analysis
gated, consuming 500 million m3 (Rinaudo et al., 2020), equivalent to
only 2 % Loire total annual discharge. The basin has a distinctly rural char- Data analysis is critical phase in river dynamic forecasting with a data-
acter highlighted by a low population density, with a total water exploita- driven model, in which the quality and quantity of data used for modelling
tion of 1000 million m3 for drinking (~4 % of the Loire discharge). is key to prediction reliability. Analysis is about underpinning the trend and
However, rivers are not the primary source of water use. Water use in this identifying patterns in the available time series to produce an accurate fore-
basin is remarkably comes from intense groundwater exploitation where cast for the future. Working steps involve pre-processing the data, feature

3
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

engineering and selecting the most relevant and representative informa- combination of historical and current monitoring, to identify these changes
tion, all of which have a direct impact on forecast efficiency. in river dynamics.

3.1. Data pre-processing 3.2. Data feature engineering: correlation between time series

In pre-processing, the time series are cleaned, normalized, and trans- Hydrological responses are the result of the concomitance of several cli-
formed to make them suitable for analysis in the next step, and then fed matic, geological, anthropogenic, and oceanic factors, involving complex
into the neural network. The goal of this step is to improve the quality processes and interactions that occur at different time scales and frequen-
and utility of the measured data. The raw data is first processed to remove cies. The presence of these phenomena leads to a strongly nonlinear behav-
errors and fill measurement gaps. To limit the errors, the validated dataset ior of hydrological responses, which makes their prediction a challenge. It is
is prioritized in this study with some manual intervention to eliminate vi- therefore essential to understand these phenomena by analyzing their
sual anomalies, while the rest is considered as noise in the modelling multi-frequency behavior in order to ensure their usefulness as data in a
which is later based on the neural network functionality. The gaps are filled forecasting procedure. Common methods for analyzing multifrequency sig-
with an autoregressive algorithm (Akaike, 1969), which is effective for nals include wavelet analysis, empirical mode decomposition, or the Fou-
short gaps; records with longer gaps are then left aside in the selection rier transform, which identify and separate different frequency
step because they have a lower correlation to the target. components and model each component individually before combining
Since processing time series maps require a huge computational mem- them for the final prediction. These approaches can lead to accurate model-
ory to extract relevant features (Zhu et al., 2017), we simply consider ling, but processing millions of input series in this case requires excessive
time series point by point at each map pixel to select the most relative se- computational effort. In this approach, we propose a simplified estimation
ries. This scheme may minimize large-scale interactions, but simplify pro- in dealing with time series, where the input data are resampled at a fre-
cessing. However, as discussed in Section 3.3, this simplification only has quency that best fits the target signal, as shown in Fig. 2. The figure exam-
marginal impact on the resulting data as the behavior of climate parameters ples how the time series in the database correlate with the target signal at
is consistent at large scales. The forecast is then established through the different frequencies, where the target is the daily river discharge at station
joint processing of highly related time series, which enhances the accuracy No.1, which is located downstream of the Loire River and exhibits minimal
and reliability of the prediction. In this model, the projection mechanism influence from human activities.
works by using leading signals (positive lag, detailed in Supplementary As shown in Fig. 2a, the daily discharge at station No.1 shares the most
2), which provide early indications of potential changes through a similarity to the weeklies discharges at the other river stations in the basin,

Fig. 1. The Loire-Bretagne, the largest basin in France with a crucial environmental importance (modified from Agence de l'eau Loire-Bretagne). The red dots mark represen-
tative stations to be studied later.

4
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

Fig. 2. The discharge fluctuation at station No.1 exhibits varying correlations with other datasets sampled at different frequencies ranging from daily to monthly. This multi-
frequency behavior highlights the need for thorough data analysis to derive meaningful frequencies for accurate forecasting.

reflecting the downstream flow regime, which contains mainly low fre- variables and the river system, which has been backed up by previous re-
quency. The analysis also confirms that the discharges at the stations in search of (Wang et al., 2022) since the hydrosystem involves multiple
the basin are determined by low-frequency features at the basin scale, interacting variables, and each parameter can have a cascading, direct or in-
while the high-frequency signals are otherwise related to local conditions, direct impact on the entire system. River discharge, on the other hand, is
as discussed in (Vu et al., 2021) for the Seine catchment. The correlation be- more closely related to accumulated precipitation (see Fig. 2f). It is well
tween station measurements remains significantly high, indicating that known that runoff directly replenishes streamflow, resulting in a clear influ-
flow conditions in the basin are consistent to some extent. Otherwise, the ence of precipitation on river discharge. A long-term accumulation of pre-
discharge signal agrees well with the daily observations in the aquifer. cipitation over two to three months exhibits a clear influence on river
The piezometer signal in fact also consists mainly of the low-frequency dynamics, as already mentioned in the drought studies (Sutanto et al.,
component and is closely related to recharge from the river flows through 2020; Solaraju-Murali et al., 2019). Soil moisture similarly shows an impor-
the filtration process, resulting in a delay in the piezometer signal com- tant impact on river dynamics, as illustrated in Fig. 5g, with the highest cor-
pared to the discharge observations, which will be discussed in the next dis- relation among the climatic parameters considered. This is also the
cussion on phase shifting. However, sea level shows little correlation with conclusion of recent research by Chatterjee et al., 2022, in which soil mois-
the river discharge signal in this case because station No.1 is not located ture is considered an essential component for describing and predicting
in the tidal zone of the Atlantic Ocean. Although there is no direct relation- large-scale droughts. Indeed, subsurface soil moisture directly determines
ship between the two, both are influenced by large-scale features of the the effectiveness of runoff and other surface and subsurface discharges
global water system, such as climatic conditions, which causes some simi- into rivers.
larity between them at a low frequency. A loose relationship in this case
complicates the use of sea level signals in the discharge forecasts but may 3.3. Data feature engineering: time lag between time series
be helpful in other studies where river and sea interact.
Climatic parameters such as temperature, atmospheric pressure, evapo- Time lag estimation is important for forecasting dynamics to identify
ration, and humidity show the strongest correlation with a monthly fre- and account for the delay between cause and effect in a multivariate sys-
quency, reaching a maximum correlation coefficient of about 0.7. This tem. Understanding the time lag helps to better foresee how changes in
low correlation demonstrates a weak connection between these climatic one variable will affect another variable in the coming time steps with

5
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

some certainty. To continue the investigation from the previous section, The climate parameters overall exhibit a wide range of time lag and cor-
Fig. 3 demonstrates the time lag between the discharge signal at station relation, as indicated in Fig. 3d–i, as the climatic signals are recorded world-
No.1 and other parameters in the database, as well as their correlation. wide with different temporal and spatial distributions. Nevertheless, a
As the station being studied is located downstream, the majority of sta- distinct seasonal pattern is evident in the parameter dynamics. The seasonal
tions show a positive time lag to station No.1 with a strong correlation (see behavior of climate appears to significantly affect the response of the hydro-
Fig. 3a). This can be attributed to the clear connection in river flow through logic system during certain periods when there is a high correlation be-
the stations, as the flow path follows the rivers within the basin, taking sev- tween river discharge and climatic observations. Otherwise, the
eral days to a week to reach downstream which associates to a short time correlation is unstable with weak periods around the 90-day and -90-day
lag between flow signals. While such a short delay can be beneficial for lags, which apparently correspond to season changes. Fluctuations in the
short-term forecasting, as is often addressed in the literature (Zhu et al., correlation over the time lags thus pose a challenge to seasonal forecasting
2020; Mehedi et al., 2022; Sahoo et al., 2019), it is not sufficient for long- models since the predictive accuracy over the lead time deteriorates in re-
term forecasting, which often deals with monthly to seasonal lead times. sponse to such fluctuations. In this approach, this issue is overcome by de-
In contrast, groundwater measurements in Fig. 3b expose a long lag com- signing a model with multiple lead times, which mitigates the reduction in
pared to surface water due to the recharge process that connects groundwa- predictive accuracy over the lead time range.
ter from the surface stream. In this case, recharge can take days to months
to feed the subsurface aquifer depending on site conditions throughout the 3.4. Data feature at global scale
basin. As discussed in Section 2, groundwater in this basin does not supply
much water to the river stream, with a marginal loading of less than 5 % In this analysis, the discharge signal at station No.1 is related to climate
seasonal river flows, primarily around the Beauce Aquifer at the central data at the global scale, with only the positive time lag considered as part of
part of the basin. This particular context in the basin hydrosystem explains the results shown in Fig. 3. The analysis reveals how river discharge in a
the delay of most piezometer fluctuations compared to the river dynamics. given location connects to the global climate system by comparing their
Sea level observations otherwise show a discarded time lag, as depicted fluctuation signals over time. Overall, the relationship between river dis-
Fig. 3c, with little correlation to river discharge due to the unclear relation- charge and global variables is complex and varies spatially in both correla-
ship between rivers in the basin and the oceans. tion and lag, depending on the background of the parameters. The river

Fig. 3. Time lag and correlation between the discharge measurement at station No.1 and other parameters in the database. The analysis highlights the differences between the
local and global datasets in predictive performance. While the local data (in figures a and b) mainly target a short-term lag, the global data (in the remaining figures) provide a
much longer lag with a clear decline in correlation due to the season shifts at the lag of about 90 days.

6
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

discharge is linked not only to local conditions on the French territory, but The mapping in Fig. 4a indicates that the discharge signal in the Loire
also to other regions of the world, as expressed by a favorable correlation River shows a strong correlation with the temperature fluctuations in the
that shows a strong connection in the climate system and the water system temperate zones where it is located, but shows no correlation with the trop-
at the global scale. ical areas. The correlation seems to express an interdependency of local

Fig. 4. Correlation and time lag of the discharge signal at station No.1 (the red cross) to climate parameters. Each parameter demonstrates a distinct pattern of correlation and
time lag with the discharge signal, but the extensive areas with high correlation provide evidence that global-scale datasets are useful for local-scale prediction.

7
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

river dynamics and larger climatic oscillations. However, when arriving an continents or oceans and exhibit a long-term cyclical pattern that ranges
important time lag between river activities and local temperature fluctua- from monthly to annually to decades. These cycles can be influenced by a
tions, by more than half a year as in Fig. 4a, Loire River activities are indeed number of factors, including changes in temperature, precipitation, and at-
out-of-phase with local temperature fluctuations. In France, the highest dis- mospheric circulation patterns. A holistic understanding of the physical
charge rate of the Loire River responds to the rainy season, coinciding with processes that control the climate system includes, but cannot account
the winter and spring periods when local temperatures are at their lowest. for, the complex interactions among various climate parameters at multiple
This implies that river discharge is more influenced by the accumulation of scales and the underlying physical mechanisms that drive these interac-
precipitation during the winter and early spring months, rather than direct tions. Consequently, before switching to a better strategy that considers
temperature effects. Surprisingly, the Loire discharges are in phase with a the underlying physics, it is preferable to work with a simplified idea in
strong correlation to the temperature on the opposite hemisphere. The ob- this technique.
servation underscores the complex interplay between local and global cli-
matic factors in shaping river dynamics. Moreover, the diphase 4. Methodology and model design
phenomenon exemplifies a typical issue with data-driven correlation rela-
tionships, where a high correlation does not necessarily imply causality. Phase shift forecast forms the basis of the model that allows detection
Atmospheric pressure in Fig. 4c and d behaves differently on a local and prediction of changes from underlying patterns in hydrological and cli-
scale than other climate parameters and does not appear to be closely re- matic dynamics. Processing the phase shift prediction involves: 1) estimat-
lated to river dynamics. However, flux variations show a more obvious cor- ing the phase shift and the correlation between the target and the time
relation with climate on the other side of the Asian-European continent. series in the database; 2) selecting the most relevant series; 3) design the
While there is a loose correlation between river activity and local atmo- neural network, training the model and performing the prediction. It should
spheric pressure, studies have shown that water cycle such as drought are be noted that only the most relevant time series from the database are used
associated with persistent or recurrent atmospheric circulation patterns to ensure computational efficiency and to avoid noise in forecasting. The
(Namias, 1985). For example, high-pressure circulations frequently occur first two steps were described in the previous section; this section briefly
during summer droughts in the southeastern United States where the diur- discusses the methodology for designing the neural network. The result is
nal circulation patterns associated with drought are not significantly differ- presented in the next section with discussions.
ent from the patterns that occur during periods without drought. Instead, In this study, our main objective is to design a forecasting model that
drought is associated with persistent or recurrent circulations that produce can accurately predict river discharge over an extended period of time. To
little or no precipitation (Hanson, 1991). River flow therefore associates to achieve this, we intend to use a Long Short-Term Memory (LSTM) neural
water-related parameters rather than other climatic factors, as discussed network that can perform multiple time steps ahead, with the longest pre-
below. diction horizon being 180-time steps (more details in Supplementary 3).
The relationship between precipitation and river discharge is illustrated To ensure the accuracy of the model, we intend to incorporate multivariate
in Fig. 4e for the correlation and in Fig. 4f for the time lag at the global inputs from both local and global data sources that come from different
scale. The figures demonstrate that while there is a strong positive correla- physical backgrounds, frequencies, and observational sources. However,
tion and a short lag time between precipitation and river discharge at the the complexity of the input data requires an effective neural network that
regional scale, this relationship becomes more heterogeneous at larger can interpret complex features at the input and produce accurate long-
scales. Precipitation distribution is highly variable both spatially and tem- term predictions. As a result, we propose the use of a stacked LSTM that
porally and is strongly influenced by local conditions such as topography, can effectively capture the complex relationships between input and output
large-scale climatic circulations, or other parameters such as evaporation. variables and produce reliable and accurate predictions.
Fluctuations in precipitation patterns can lead to changes in the timing, Stacked LSTMs were introduced by Graves et al. (2013) in their applica-
magnitude, and duration of river flow. For instance, prolonged dry periods tion of LSTMs to speech recognition, which outperformed a benchmark on
can lead to a decrease in river flow, whereas intense precipitation events a challenging standard problem. They found that the depth of the network
can cause a rapid increase in river flow. In general, an increase in precipita- is more significant than the number of memory cells in a given layer. In-
tion is expected to result in higher river flow, and a strong positive correla- deed, Deep learning is built around a hypothesis that a deep/hierarchical
tion between the two is usually observed. model can be exponentially more efficient than a shallow model at repre-
Loire River discharge similarly shows a clear relationship to the soil senting some features. To make a neural network deeper, additional hidden
moisture at a continental scale with a high correlation and short lag layers can be added. Of which, each layer processes a portion of the task
which confirms a direct impact of soil conditions to the river flow (see and passes it to the next, turning the deep neural network into a processing
Fig. 4g & h). In fact, the moisture in shallow ground plays a crucial role pipeline that solves a portion of the task until the last layer provides the out-
in determining the water runoff on the surface that feeds the river stream, put. These additional layers recombine the learned representation from
both in the short term and the long term. During periods of high soil mois- prior layers to create new representations at higher levels of abstraction,
ture, such as after heavy rains or during spring thaws, river discharge may for instance from lines to shapes, then to objects. Although a neural net-
increase rapidly as excess water from the soil enters the river. In contrast, work with a single hidden layer can approximate most functions, increasing
during periods of low soil moisture, such as during a drought, runoff can de- the depth of the network provides an alternative solution that requires
crease significantly as less water is available to the river. A similar relation- fewer neurons and trains faster.
ship is observed for relative humidity in Fig. 4i and j on the continents, The proposed neural network optimized in this case consists of two
while humidity over the oceans follows a different mechanism. This is LSTM layers, of which the neurons are optimized for each corresponding
also true for potential evaporation in Fig. 4k and l. In fact, soil moisture, rel- problem (see Fig. 5). However, in all cases, the neural networks provide a
ative humidity, and evaporation in the air are closely related, which to- sequence output instead of a single value output in the classical LSTM
gether with precipitation defines the water cycle that controls the supply model. As long-term forecasts are associated with low frequencies, the op-
to rivers and thus determines river dynamics. eration of the neural network is optimized with an output sequence com-
In short, this data-driven approach uses correlation-based data selection posing various frequencies, the further the lower. For a six-month forecast
through identifying relationships between different variables that need not horizon, the predictions are daily for the first week, every four days for
always have a physical basis. This means that the relationships identified the next three weeks, weekly for the following two months, and two-
using this approach do not always reflect the underlying physical processes weekly for the later periods. Forecasting is performed daily after updating
that control the climate system. In reality, however, the relationships the measurement.
among different climate parameters are complex and can vary at different The composition of input data selection aligns with the previously men-
scales. Many climate phenomena occur in large cycles that can span tioned time step ahead design, where the selection is divided into groups

8
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

Fig. 5. Stacked neural network composing multiple LSTM layers in vertical direction.

based on specific thresholds. In detail, we have maximum four groups (for not directly connected to the Loire. Finally, one additional station (No.
six-month horizon): the first week with a daily time step, the next three 5) is included from a neighboring basin to further evaluate the methodol-
weeks with a four-day time step, the subsequent two months with a weekly ogy performance in an extended context. The neural networks will be
time step, and the remaining period with a two-weekly time step. As the trained with a data set of 18 years and tested in the following 4 years
correlation is computed for each time-lag, the selection of time series relies until the current year 2022. The forecast at each station is examined with
on threshold the correlation over the time lag group which corresponds to different time horizons, ranging from 1 month to 6 months ahead. The de-
the time step ahead group at the output. The group-based assessment en- tails of input data and neural networks used in the forecasting are listed in
sures the stability of the approach performance at the output, which en- Table S4.1 and the results of which are summarized in Table S4.2, both of
hances the interconnection in multiple time-step forecasting. The which are in Supplementary 4.
selection for each group is then combined to construct the input dataset
that feeds into the neural network. To optimize the accuracy of the forecast- 5.1. Forecast of 1 month ahead
ing results for each station, the thresholds in this process are manually de-
termined through trial tests. However, it is important to note that this The forecast of river discharge is first performed for a one-month hori-
threshold optimization process can be computationally intensive, as it in- zon of 31 days ahead using a Stacked-LSTM model, which provides highly
volves parallel optimization of the neural network structure detailed in accurate results as shown in Fig. 6. To provide a comprehensive view of the
the following paragraph. This challenge is commonly encountered when study, five distinct cases are selected for detailed presentation. Case a,
dealing with multiple time step forecasts using a large database. which involves a downstream station No.1 with a large discharge and
In the neural network, each LSTM layer is followed by a dropout layer to mainly low frequency, is forecasted with remarkable accuracy. In Case b,
reduce the problem of overfitting. The number of neurons in each LSTM the station located at the basin center with a large stream and certain abrupt
layer is optimized by the trial test where the number of neurons varies changes in river flow presents some difficulties for prediction. Case c, which
from 10 to 150. For training, the ADAM algorithm is used with a constant involves station No.11 attached to the Loire tributary with a small propor-
learning rate of 0.01 in 250 epochs. To select the best performing model, tion to the main river stream, where experienced highly fluctuated flow
the three most frequently used evaluation metrics in time series analysis with long droughts and short floods, posing significant challenges to fore-
were selected. The model that shows the best performance on most of the cast. In Case d, the station is located in the Bretagne sub-basin, with short
evaluation metrics is selected as the most efficient and accurate model for rivers and small flow through a natural park less influenced by human ac-
predicting water dynamics. For these evaluation metrics, x represents the tivities, which allows better forecasting. Case e concerns the Charente
actual value, y represents the predicted value, and n represents the total River at station No.5 in the neighboring basin, which exhibits different
number of test samples. flow features, especially at high frequency, with lower dynamics compared
to the Loire, but some correlation to low frequency. The transfer of informa-
5. Result and discussion tion between the two basins included in the data used data worked favor-
ably, indicating the strength of data-driven method in building
This section presents the forecast results for 18 representative stations forecasting models at large scales.
across the Loire-Bretagne basin, including one station from a neighboring In this short time horizon, the forecasting model relies mainly on the
basin, the Charente, to demonstrate the proposed model effectiveness (see local data (see Table S4.1) with the gauge records in the rivers and in the
Fig. 1). Of the 18 stations, five are located in the mainstream of the Loire basin itself. These measurements are associated with the transit time of
River, with one downstream, two in the middle, and two upstream. Addi- water through the regional system and contribute significantly to the pre-
tionally, six stations are regularly located in the Loire distributary, provid- dictive capabilities of the model. The model effectiveness is thus closely
ing insight into the flow regime in the Loire connectors. Five other linked to the specific features of the local region, as broader scale data
stations are located in small rivers in the Bretagne sub-basin, which are proves to be less insightful. By focusing on local data, the model can capture

9
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

Fig. 6. Forecasts of river discharge for 31 days ahead at five representative stations, the resulting forecasts are of good accuracy with some losses of extreme events.

the typical dynamics and behavior of the river system in the region, which For example, the Loire River is heavily stressed by dams, reservoirs, and ir-
have been reported extensively in the literature (Sit et al., 2020). In more rigation activities, making it difficult to accurately predict high-frequency
detail, the prediction of discharge is highly dependent on the frequency discharges. In addition, the method used in the study has some limitations
and dynamics of the discharge, which is confirmed in Section 2.3. Low- in dealing with extreme events and accidents. This is because the Long
frequency discharges with similar rhythm and seasonal behavior are easier Short-Term Memory (LSTM) model is well suited for trend data and fre-
to predict because measurements throughout the basin have a low- quently occurring events, but not for infrequent incidents. Other methods
frequency basis. High-frequency discharges, on the other hand, are chal- such as inference techniques or physical models may be better suited for
lenging because they depend on local conditions, both human and natural. these types of incidents. Finally, it should be noted that only observational

10
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

data of natural phenomena are used in the study, reflecting only natural be- includes both local and global datasets to support the multiple time step
havior, and excluding human interactions. ahead forecast where local data is crucial for short terms, while global
data is essential for longer-term forecasts because they possess a longer
5.2. Forecast of 3 months ahead time lag to discharge signals. The added complexity of the input data is ex-
pected to improve the predictive information, resulting in more accurate
To extend the forecast to a three-month horizon, the model mobilizes and reliable forecasts. Although a model with a longer time horizon mobi-
more data compared to the previous shorter horizon forecasts. This lizes more input data sets, the increase is only slight compared to the

Fig. 7. Forecasts of 90 days ahead at five representative stations, showing a good trend, albeit with lower accuracy compared to a shorter horizon as in Fig. 6. The predictions
lose the floods, but favorable for the drought forecast.

11
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

previous case with a one-month forecast. This is because the correlation be- provide the average of spatial information, especially when it comes to
tween data may decrease due to seasonal changes, which can make it chal- large-scale climate variability such as the North Atlantic Oscillation
lenging to select better data with high correlation to improve the (NAO), the data mainly reflect information about long-term trends. While
forecasting process. This physical constraint further adds complexity to details and high frequencies are strongly determined by local conditions in-
solving long-term predictions, especially for seasonal forecasts as in cluding water management plans, irrigation schemes, and other natural and
this case. human activities, where they are associated with temporal variations that
To demonstrate the effectiveness of our proposed model, we draw on lend themselves to short-term prediction. As a result, forecast accuracy is
cases from the previous study in Section 4. In most cases, the predictions better in Bretagne sub-basin as shown in Fig. 8d, which has shorter rivers
correctly reproduce the flow dynamics trend with a high correlation coeffi- and is less influenced by human activities, so the high-frequency compo-
cient. The models produce mainly low-frequency results, which is to be ex- nents are lower. This otherwise highlights the importance of considering
pected for long-term forecasts, the accuracy of which is lower for shorter local conditions for developing accurate forecasts, which are often inade-
time horizons, as shown in the previous section. The prediction is not suited quate in both time and space due to the complexity of management systems
to capture the peak flood events, which are often triggered by rapid surface in practice.
runoff due to precipitation. This influence of precipitation on the long-term
operation of the basin remains unclear because of the limited connection 6. Conclusion
between surface water and subsurface aquifers in this basin and the lack
of the physical properties of the catchment such as the topography and This work presents a multivariate model designed to make long-term
land cover that control the rapid events. On the other hand, low water prediction of river discharges in the Loire-Bretagne basin, which is crucial
levels that prevail during the dry season in summer are properly captured for effectively managing and preserving the environmental conditions in
by the proposed models. In fact, the recharge of the river is minimal during this region. To ensure comprehensive coverage, the model integrates data
the dry season, mainly due to low rainfalls and narrow aquifers, although from both local and global sources, including river and piezometer gauging
the aquifers contribute little to the river discharge. This particularity of within the basin, sea level, and climate parameters on the global scale. The
the Loire, with its serious lack of recharge sources, seems to facilitate the database utilizes a total of 12 parameters related to both the hydrologic and
prediction of low water levels. climate systems, which are sampled at a daily frequency. Processing data
operates on the principle of phase shift analysis, wherein data selection is
5.3. Forecast of 6 months ahead performed by identifying the best correlations through frequency analysis
of the time series. To adapt to the dynamics of river discharge, a calibration
To further explore the effectiveness of our forecasting model, the time of frequency analysis is employed to optimize the correlation and time lag
horizon in forecasting is extended to 6 months, a seasonal model that between parameters owning different physical background and frequency.
reaches the range of 180-time steps ahead. This extension will provide us The proposed approach promises long-term predictions of river discharge
with a more comprehensive understanding of system dynamics and enable that can extend up to seasonal time horizons, where the output is presented
us to make better decisions regarding hazard mitigation and other related in a sequence format that spans multiple time ranges and is updated daily to
issues. Extending the seasonal forecasts offer a more profound understand- perform forecasts. To improve information extraction in the processing, the
ing of how the nature of data types interacts with the forecasting time model uses the concept of stacked LSTM, a deeper and more complex neural
spans. network architecture, to achieve better coverage of database diversity.
To ensure the performance of the model, a stacked LSTM neural net- The developed model performs forecasting of river dynamics at repre-
work is retrieved and optimized with two LSTM layers, which allows the sentative stations throughout the region and in the neighboring basin -
model to learn more intricate patterns in the data and better capture Charente - for periods of one, three and six months, with a horizon of up
long-term dependencies. The model continues to utilize both local and to 180-time step ahead. Overall, the model provides favorable accuracy
global data, which has proven to be effective in previous analysis. The fore- and efficiency in predicting river dynamics over the years. Here, the predic-
cast accuracy for longer-term forecasts of 6 months ahead is generally lower tion quality depends on the complexity of river dynamics, with better per-
than that of the 3-month forecasts in the previous section. The results show formance for low frequencies and some losses for high frequencies. The
promise in capturing the trend in water dynamics for low-frequency com- models perform better in predicting drought periods, but may lose accuracy
ponents, although there is a significant loss in high-frequency dynamics in predicting flood peaks. By using both data types at the input, the result-
(see Fig. 8). While flooding patterns are absent from the forecast, droughts ing forecast emphasizes the importance of both local and global datasets in
are well traced over years. However, as shown for station No.16 in Fig. 8d predicting river discharge, where local datasets determine short-term pre-
and stations No.9,15, and18 in Table S4.2 a longer time horizon associated dictions while global datasets drive long-range predictions. An interaction
with lower accuracy is not always the case when long-term leading infor- between the time range in forecasts and the shift in data correlation is
mation can enhance the accuracy of shorter-term forecasts. As discussed also observed, with lower correlation at a lag of about 3 months due to sea-
in Section 3.4, data correlation is higher at a six-month lag than at a sonal changes affecting the quality of the forecast, but compensated by a
three-month lag, so this more relevant information strengthens the ability higher correlation at longer lag of 6 months. Such compensation improves
to predict the longer range. This is the reason why, in this case, doubling the quality of the forecast at a six-month horizon, and opens the door for a
the time horizon from 3 to 6 months changes the forecast quality only very long-term forecast with a time horizon of seasonal scale. Since this
slightly. The mutual effect of time span in forecasting emphasizes the im- long lag corresponds well with the large-scale data sets, this reaffirms the
portance of considering multivariate systems with multiple time ahead in importance of including large-scale data for long-term forecasting. This re-
dynamic predictions, which can help compensate for differences among search study establishes a solid practical groundwork to enhance the utili-
datasets and time step ahead, which is the novelty of this research. zation of big data in order to facilitate accurate and reliable long-term
As summarized in Table S4.1, the global data set has a significant pres- forecasting of hydrodynamics that are also applicable to environmental dy-
ence in this case compared to the local data set. This finding, along with the namics with similar variability behavior.
previous two cases, reaffirms the importance of large-scale data in making In this study, the database are limited to natural parameters and do not
long-term predictions. Since the models combine data from all climatic pa- include human activities or accidents; therefore, the predictions are more
rameter fields, the abundance of datasets furthers the predictive capacity of accurate for rivers with low levels of intervention in the Bretagne sub-
the proposed approach. Even though climate parameters are not uniformly basin. In addition, no uncertainty analysis was performed, and the point
present in the global datasets at the model input, with an emphasis on mea- data used in the study do not provide in-depth information that could incor-
surements of soil moisture, relative humidity, and evaporation rate, which porate spatial dynamics on a regional or larger scale. Future work should
restates the discussion in Section 3.4 above. In addition, large-scale data also focus on the explainability of this deep learning model to determine

12
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

Fig. 8. Forecasts of 180 days ahead reflect the trend of dynamics, though with lower accuracy compared to 3 months ahead in Fig. 7. A mutual effect over different time
horizons in forecasting is observed at station No16 (panel d) that enhances the forecast accuracy.

the significance of each dataset used, which may improve the accuracy and CRediT authorship contribution statement
reliability of the predictions.
Supplementary data to this article can be found online at https://doi. M.T. Vu: Conceptualization, Methodology, Software, Data curation,
org/10.1016/j.scitotenv.2023.165494. Writing – original draft, Visualization, Investigation. A. Jardani:

13
M.T. Vu et al. Science of the Total Environment 897 (2023) 165494

Methodology, Conceptualization, Writing – review & editing. M. Krimissa: Li, J., Yuan, X., 2023. Daily streamflow forecasts based on cascade long short-term memory
(LSTM) model over the Yangtze River Basin. Water 15 (6).
Conceptualization, Writing – review & editing. F. Zaoui: Conceptualiza- Martin Santos, I., Herrnegger, M., Holzmann, H., 2021. Seasonal discharge forecasting for the
tion, Writing – review & editing. N. Massei: Conceptualization, Writing – Upper Danube. J. Hydrol. Reg. Stud. 37, 100905.
review & editing. Massei, N., et al., 2010. Long-term hydrological changes of the Seine River flow (France) and
their relation to the North Atlantic Oscillation over the period 1950–2008. Int.
J. Climatol. 30 (14), 2146–2154.
Data availability Mehedi, M.A.A., Khosravi, M., Yazdan, M.M.S., Shabanian, H., 2022. Exploring temporal dy-
namics of river discharge using univariate long short-term memory (LSTM) recurrent
neural network at East Branch of Delaware River. Hydrology 9 (11).
The authors do not have permission to share data. Mosavi, A., Ozturk, P., Chau, K.-w, 2018. Flood prediction using machine learning models: lit-
erature review. Water 10.
Murphy, J., et al., 2010. Towards prediction of decadal climate variability and change.
Declaration of competing interest Procedia Environ. Sci. 1, 287–304.
Namias, J., 1985. Some empirical evidence for the influence of snow cover on temperature
The authors declare that they have no known competing financial inter- and precipitation. Mon. Weather Rev. 1542–1553.
Natel de Moura, C., Seibert, J., Detzel, D.H.M., 2022. Evaluating the long short-term memory
ests or personal relationships that could have appeared to influence the
(LSTM) network for discharge prediction under changing climate conditions. Hydrol. Res.
work reported in this paper. 53 (5), 657–667.
Nguyen, A.D., et al., 2022. Accurate discharge and water level forecasting using ensemble
References learning with genetic algorithm and singular spectrum analysis-based denoising. Sci.
Rep. 12 (1), 19870.
Niu, W.-J., et al., 2019. Forecasting reservoir monthly runoff via ensemble empirical mode de-
Akaike, H., 1969. Fitting autoregressive models for prediction. Ann. Inst. Stat. Math. 21, composition and extreme learning machine optimized by an improved gravitational
243–247. search algorithm. Appl. Soft Comput. 82, 105589.
Baratelli, F., Flipo, N., Moatar, F., 2016. Estimation of stream-aquifer exchanges at regional Ouma, Y.O., Cheruyot, R., Wachera, A.N., 2022. Rainfall and runoff time-series trend analysis
scale using a distributed model: sensitivity to in-stream water level fluctuations, riverbed using LSTM recurrent neural network and wavelet neural network with satellite-based
elevation and roughness. J. Hydrol. 542, 686–703. meteorological data: case study of Nzoia hydrologic basin. Complex Intell. Syst. 8 (1),
Beaufort, A., et al., 2020. Influence of landscape and hydrological factors on stream–air tem- 213–236.
perature relationships at regional scale. Hydrol. Process. 34 (3), 583–597. Petelet-Giraud, E., Négrel, P., Casanova, J., 2018. Tracing surface water mixing and ground-
Chandra, R., Goyal, S., Gupta, R., 2021. Evaluation of deep learning models for multi-step water inputs using chemical and isotope fingerprints (δ18O-δ2H, 87Sr/86Sr) at basin
ahead time series prediction. IEEE Access 99, 1. scale: the Loire River (France). Appl. Geochem. 97, 279–290.
Chatterjee, S., et al., 2022. Soil moisture as an essential component for delineating and fore- Rinaudo, J.-D., Marchet, P., Billault, P., 2020. Groundwater management planning at the river
casting agricultural rather than meteorological drought. Remote Sens. Environ. 269, basin district level: comparative analysis of the Adour-Garonne and Loire-Bretagne River
112833. Basins. Sustainable Groundwater Management: A Comparative Analysis of French and
Chidepudi, S.K.R., et al., 2023. A wavelet-assisted deep learning approach for simulating Australian Policies and Implications to Other Countries. s.l.:Springer International Pub-
groundwater levels affected by low-frequency variability. Sci. Total Environ. 865, lishing, pp. 67–91.
161035. Sahoo, B.B., Jha, R., Singh, A., Kumar, D., 2019. Long short-term memory (LSTM) recurrent
Dikshit, A., Pradhan, B., Alamri, A.M., 2021. Long lead time drought forecasting using lagged neural network for low-flow hydrological time series forecasting. Acta Geophys. 67 (5),
climate variables and a stacked long short-term memory model. Sci. Total Environ. 755, 1471–1481.
142638. Shen, C., 2018. A transdisciplinary review of deep learning research and its relevance for
Dramsch, J.S., 2020. Chapter one - 70 years of machine learning in geoscience in review. In: water resources scientists. Water Resour. Res. 54 (11), 8558–8593.
Krischer, B.M.A.L. (Ed.), Machine Learning in Geosciences. s.l.:Elsevier, pp. 1–55. Sit, M.A., et al., 2020. A Comprehensive Review of Deep Learning Applications in Hydrology
Fang, L., Shao, D., 2022. Application of long short-term memory (LSTM) on the prediction of and Water Resources. ArXiv. Vol. abs/2007.12269.
rainfall-runoff in karst area. Front. Phys. 9, 790687. Solaraju-Murali, B., Caron, L.-P., Gonzalez-Reviriego, N., Doblas-Reyes, F.J., 2019. Multi-year
Gauch, M., et al., 2021. Rainfall-runoff prediction at multiple timescales with a single long prediction of European summer drought conditions for the agricultural sector. Environ.
short-term memory network. Hydrol. Earth Syst. Sci. 25 (4), 2045–2062. Res. Lett. 14 (12), 124014.
Girihagama, L., et al., 2022. Streamflow modelling and forecasting for Canadian watersheds Sutanto, S.J., Wetterhall, F., Van Lanen, H.A.J., 2020. Hydrological drought forecasts outper-
using LSTM networks with attention mechanism. Neural Comput. & Applic. 34 (22), form meteorological drought forecasts. Environ. Res. Lett. 15 (8), 084010.
19995–20015. Vu, M., Jardani, A., Massei, N., Fournier, M., 2021. Reconstruction of missing groundwater
Graves, A., Mohamed, A.-r., Hinton, G., 2013. Speech recognition with deep recurrent neural level data by using Long Short-Term Memory (LSTM) deep neural network. J. Hydrol.
networks. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Pro- 597, 125776.
cessing - Proceedings. Vol. 38. Vu, M., et al., 2023. Long-run forecasting surface and groundwater dynamics from intermit-
Gürsoy, Ö., Engin, S.N., 2019. A wavelet neural network approach to predict daily river dis- tent observation data: an evaluation for 50 years. Sci. Total Environ. 880, 163338.
charge using meteorological data. Meas. Control 52, 599–607. Wang, S., Peng, H., Hu, Q., Jiang, M., 2022. Analysis of runoff generation driving factors
Hagen, J.S., et al., 2021. Identifying major drivers of daily streamflow from large-scale atmo- based on hydrological model and interpretable machine learning method. J. Hydrol.
spheric circulation with machine learning. J. Hydrol. 596, 126086. Reg. Stud. 42, 101139.
Hanson, R.L., 1991. Evapotranspiration and droughts. National Water Summary 1988-89— Wang, X., Zhang, Y., 2020. Multi-step-ahead Time Series Prediction Method With Stacking
Hydrologic Events and Floods and Droughts, pp. 99–104. LSTM Neural Network. s.l., s.n., pp. 51–55.
Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8), Xiang, Z., Yan, J., Demir, I., 2020. A rainfall-runoff model with LSTM-based sequence-to-
1735–1780. sequence learning. Water Resour. Res. 56 (1).
Hunt, K.M.R., Matthews, G.R., Pappenberger, F., Prudhomme, C., 2022. Using a long short- Zhang, X., et al., 2018. A novel hybrid data-driven model for daily land surface temperature
term memory (LSTM) neural network to boost river streamflow forecasts over the west- forecasting using long short-term memory neural network based on ensemble empirical
ern United States. Hydrol. Earth Syst. Sci. 26 (21), 5449–5472. mode decomposition. Int. J. Environ. Res. Public Health 15 (5).
Kalnay, E., et al., 1996. The NCEP/NCAR 40-year reanalysis project. Bull. Am. Meteorol. Soc. Zhu, S., Luo, X., Yuan, X., Xu, Z., 2020. An improved long short-term memory network for
77 (3), 437–472. streamflow forecasting in the upper Yangtze River. Stoch. Env. Res. Risk A. 1313–1329.
Karpatne, A., et al., 2019. Machine learning for the geosciences: challenges and opportunities. Zhu, X.X., et al., 2017. Deep learning in remote sensing: a comprehensive review and list of
IEEE Trans. Knowl. Data Eng. 31 (8), 1544–1554. resources. IEEE Geosci. Remote Sens. Mag. 5 (4), 8–36.
Le, X.-H., et al., 2021. Comparison of deep learning techniques for river streamflow forecast-
ing. IEEE Access 9, 71805–71820.

14

You might also like