Sahoo2009 PDF
Sahoo2009 PDF
Journal of Hydrology
journal homepage: www.elsevier.com/locate/jhydrol
a r t i c l e i n f o s u m m a r y
Article history: Stream water temperature is considered both a dominant factor in determining the longitudinal distribu-
Received 11 July 2008 tion pattern of aquatic biota and as a general metabolic indicator for the water body, since so many bio-
Received in revised form 26 July 2009 logical processes are temperature dependent. Moreover, the plunging depth of stream water, its
Accepted 21 September 2009
associated pollutant load, and its potential impact on lake/reservoir ecology is dependent on water tem-
This manuscript was handled by L. Charlet,
perature. Lack of detailed datasets and knowledge on physical processes of the stream system limits the
Editor-in-Chief, with the assistance of Jose use of a phenomenological model to estimate stream temperature. Rather, empirical models have been
D. Salas, Associate Editor used as viable alternatives. In this study, an empirical model (artificial neural networks (ANN)), a statis-
tical model (multiple regression analysis (MRA)), and the chaotic non-linear dynamic algorithms (CNDA)
Keywords: were examined to predict the stream water temperature from the available solar radiation and air tem-
Chaos algorithms perature. Observed time series data were non-linear and non-Gaussian, thus the method of time delay
Back propagation neural network was applied to form the new dataset that closely represent the inherent system dynamics. Phase-space
Radial basic function neural network reconstruction plots show that time lag equal to 0 and greater than 10 result in highly dependent
Genetic algorithms (a well-defined attractor) and highly independent (no attractor at all) reconstructions, respectively,
Multiple regression analysis and, therefore, may not be appropriate to use. Delayed vector was found to be strongly correlated with
Stream temperature
the original vector when time lag is small (i.e. less than 3-day) and vice versa. Power spectrum analysis
and autocorrelation function suggested that the time series data was chaotic and mutual information
function indicates that optimum time lag was approximately 3-day. The chaotic non-linear dynamic algo-
rithm and four-layer back propagation neural network (4BPNN) optimized by micro-genetic algorithms
(lGA) showed that the prediction performance was optimum when data are presented to the model with
1-day and 3-day time lag, respectively. The prediction performance efficiency of MRA is higher for time
lag greater than 3-day, however, the incremental performance efficiency rate significantly decreased after
3-day time lag. The prediction performance efficiency of lGA-4BPNN was found to be the highest among
all algorithms considered in this study. Air temperature was found to be the most important variable in
stream temperature forecasting; however, the prediction performance efficiency was somewhat higher if
short wave radiation was included.
Ó 2009 Elsevier B.V. All rights reserved.
0022-1694/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.jhydrol.2009.09.037
326 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
(Lowney, 2000; Bogan et al., 2006; Caissie et al., 2007) have been model, and a multiple regression analysis model to predict stream
used to predict stream water temperature. The common approach water temperature using readily available weather variables; (2)
for deterministic modeling is to use the equilibrium temperature compare the results obtained by each method with actual observa-
concept (Edinger et al., 1968; Mohseni and Stefan, 1999; O’Driscoll tions for suggesting the optimized model; and (3) provide guide-
and DeWalle, 2006). Equilibrium temperature is the temperature lines the use of optimized method for future water temperature
that water reaches in response to a particular set of flow and atmo- prediction purposes. ANN, multiple regression analysis, and non-
spheric heat fluxes. In most cases where detailed information on linear dynamic models were developed using MATLAB version
the stream flow and meteorology are not available, investigators 7.1 (The Mathworks Inc., 2005). Brief details on ANN, regression
have found that air temperature is a good index of stream temper- analysis, and chaos non-linear dynamic algorithms are presented
ature in statistical models. in the ‘Methodology’ section.
Krasjewski et al. (1982) presented a graphical technique using
only equilibrium equations to predict river water temperature Methodology
from meteorological condition including air temperature, solar
radiation, cloud cover, wind speed, relative humidity, and atmo- Artificial neural network (ANN)
spheric pressure. Mackey and Berrie (1991) developed a simple lin-
ear regression equation to predict river water temperature using ANN uses a multilayered approach that approximates complex
only air temperature. Bogan et al. (2006) reported that a rise in mathematical functions to process data. An ANN is arranged into
air temperature is expected to increase stream temperature; how- discrete layers each layer consisting of at least one neuron (i.e.
ever, due to evaporative cooling, stream temperature does not in- node). Each node of a layer is connected to nodes of preceding
crease linearly. Poole and Berman (2001) mentioned that while and/or succeeding layers but not to nodes of the same layer with
external factors (short wave radiation, long wave radiation, wind, a connection weight (Fig. 1). Thus, as the number of layers and
relative humidity, etc.) determine the net input heat energy into nodes in each layer increases, the process becomes more complex
stream water, structural factors (e.g. watershed runoff, groundwa- demanding more computational effort. In general hydrologic and
ter discharge, stream vegetation, major dams, etc.) can ultimately environmental problems are complex and require a complex
determine stream water temperature. A statistical analysis by ANN structure for prediction purposes. The number of layers and
Kinouchi et al. (2007) showed that urban waste water increases nodes in each layer is problem specific and needs to be optimized
stream temperature (0.11–0.21 °C year1). Mohseni and Stefan (Maier and Dandy, 2000; Sahoo and Ray, 2006a,b).
(1999) showed mathematically that there is a strong relationship Fig. 1 depicts the flow of information from input layer to output
between air and water temperature if there is no human layer (feed forward). The value of connection weight between two
interference. nodes determines the strength of the node in estimating the target.
Stream temperature is used as input to a lake water quality The connection weights from input-layer neuron to hidden-layer
model (LCM) that has been developed in support of the Lake Tahoe neuron and from hidden-layer neuron to output-layer neuron are
Total Maximum Daily Load (TMDL) for water clarity (Roberts and wij and wjk, respectively. The input- and output-layer neurons are
Reuter, 2007; Sahoo et al., 2007a,b; Perez-Losada, 2001). Estima- fixed according to the input and output parameter(s) of the specific
tion of stream water temperature at the stream mouth is impor- problem. The weight vectors wij and wjk are randomly generated in
tant for it: (1) determines the stream load plunging depth in the the range between 1 and 1. Total input signal at the jth hidden
lake, (2) influences the lake water temperature after mixing, and neuron is
(3) is the metabolic indicator of the water body. In absence of ade-
quate field data, assumptions need to be made for the construction X
NI
of conceptual or deterministic models, which can often lead to v j ðnÞ ¼ yi wij þ bj ; ð1Þ
i¼1
poor estimates of output variable(s) (see Discussions in Maier
and Dandy, 1996; Jain and Srinivasulu, 2004; Sahoo and Ray, where yi is the value of the ith input parameter to hidden-layer neu-
2008a). Moreover, construction of a well calibrated/validated rons, bj is the bias for the jth hidden-layer neuron, and NI is the total
physical based model is time consuming and often beyond the input neurons. The total input signal received by the jth hidden-
available financial resources. For Lake Tahoe, virtually all of the layer neuron, vj, is converted to an output signal using an activation
inflowing streams are unimpaired by large lakes/dams. Similarly, function uj. The output signal of the jth hidden-layer neuron is
urban influences are confined to just the very lowest reaches of uj[vj(n)] for the nth pattern of the training dataset. The output neu-
the streams, immediately before they enter the lake, and are exclu- ron receives signals from all hidden-layer neurons and converts
sively storm water inputs (waste water is exported from the Basin). them to a single signal as output using an activation function
Therefore, there is a need to develop modeling techniques that can uk[]. Thus, the input and output signal of the kth output neuron are
predict stream temperature using only the cause-effect data (i.e. Nh
causing variables: air temperature and solar radiation; and affected
X
vk ¼ uj ½yj ðnÞ wjk þ bk ; ð2Þ
variable: stream temperature). j¼1
Empirical models such as artificial neural networks (ANNs) have
been used as a viable alternative approach to physical models (Ma- and
ier and Dandy, 1997; Birikundavyi et al., 2002; Jain and Srinivasulu, yk ðnÞ ¼ uk ½v k ðnÞ; ð3Þ
2004; Sahoo and Ray, 2008a). Compared to statistical regression
analysis, ANNs have been demonstrated to be a far better tool for respectively, where bk is the bias, and Nh is the total hidden-layer
prediction (Jingyi and Hall, 2004; Ramírez et al., 2005; Chen and neurons.
Kim, 2006). In addition, because sophisticated dynamical models The estimated output (yk(n)) for nth pattern is compared with
are not available for many systems of interest, chaos-based time the measured output (ok(n)) and an error (ek(n) = ok (n) yk(n)) is
P
series analysis offers an alternative means for identifying chaotic estimated. If the average mean square error (1=N Nn¼1 0:5
2
behavior and predicting time series in cases high quality, long- fek ðnÞg ) of all patterns is greater than specified error goal (Haykin,
term hydrologic records are available (Pasternack, 1999). 1999; Sahoo and Ray, 2007), then the weights are changed by an
The objectives of this paper are to: (1) develop a micro-genetic amount proportional to the difference between the desired output
algorithm optimized ANN model, a chaos non-linear dynamic and the actual output. N is the total number of patterns in the
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 327
HIDDEN LAYER
wij
x1 h1
x2 wjk
INPUT LAYER h2 ok
vj vk OUTPUT
ERROR
yi yj yk
hj k ek
xi
xNI hNh
hj
x1
NI
yj
xi v j = ∑ yi y j = ϕ(v j ) yj
i =1
xNI yj
Processing neuron
Fig. 1. Schematic of a three-layer feed-forward neural network. The connection weights from input-layer neuron to hidden-layer neuron and from hidden-layer neuron to
output-layer neuron are wij and wjk, respectively. The input- and output-layer neurons are fixed according to the input and output parameter(s) of the specific problem. bj and
bk are the biases for the input and hidden layer, respectively. yi, yj, and yk are the output signals of i, j, and kth nodes of input, hidden and output layer, respectively. An error ek
is measured between measured values ok and yk.
training dataset. Depending on the algorithms to adjust weights, The Levenberg–Marquardt (LM) algorithm (see ‘‘Appendix”)
different neural networks have been developed. The process of was selected for the back propagation training (i.e. estimation of
adjusting the weights to minimize the differences between the Dw) because: (1) it converges faster than the conventional gradient
estimated value and the actual output is called training the net- descent algorithms (Principe et al., 1999; El-Bakyr, 2003; Kisi,
work. The two most commonly used neural networks: back prop- 2004; Cigizoglu and Kisi, 2005; Alp and Cigizoglu, 2007), (2) it does
agation neural network (BPNN) and radial basis function network not need a learning rate and momentum factor like the gradient
(RBFN) (Haykin, 1999; ASCE Task Committee, 2000) are used in descent algorithms (see ‘‘Appendix”), and (3) in many cases it con-
this study. The details of BPNN and RBFN can be found in Hagan verges when other back propagation algorithms fail to converge
et al. (1996), Haykin (1999) and Sahoo and Ray (2007). Brief details (Hagan and Menhaj, 1994). Samani et al. (2007) reported that using
on functions and subroutines used or developed or modified in the the Levenberg–Marquardt (LM) algorithms instead of the gradient
current study are described below. descent algorithms for BPNN training helped reduce the conver-
gence criterion from 103 to 109. The optimum number of hidden
Back propagation neural network (BPNN) nodes and hidden layers depend on the complexity of the modeling
The back propagation algorithm is essentially a gradient des- problem and the threshold value for training error goal (Fausett,
cent technique that minimizes the network error function (Haykin, 1994).
1999; ASCE Task Committee, 2000). It involves two steps: a for- For each data pair to be learned a forward pass and a backward
ward pass and a backward pass. In the forward pass, the output pass is performed. One complete forward and backward process is
is calculated based on inputs and the connection weights (wij and referred to as an iteration (also called an epoch). The process is re-
wjk) as described above. An error (ek) is estimated at the output peated either until the error between the predicted and the mea-
neuron. In the backward pass, the error at the hidden nodes is cal- sured values falls below the pre-specified error goal (a value of
culated by back-propagating the error at the output units through 1020 in this study) or until the number of epochs reaches a pre-
the connection weights and new connection weights, w(c + 1), on determined maximum value.
the hidden nodes are estimated using the equation The number of nodes in each layer and epoch number were
optimized using micro-genetic algorithms (lGA). Both three- or
wðc þ 1Þ ¼ wðcÞ þ Dwðc þ 1Þ; ð4Þ
four-layered BPNN structure were examined. The transfer func-
where c is the epoch (i.e. iteration) number, Dw is the increment in tions, most commonly used for hidden layer(s), are the logarithmic
the weight vector computed so as to move the weight in the direc- sigmoid (Fig. 2a) and the hyperbolic tangent sigmoid (Fig. 2b)
tion of the negative gradient of the cost function (@FðwÞ=@w). (ASCE Task Committee, 2000; Maier and Dandy, 2000). For the out-
328 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
put layer, a linear transfer function (Fig. 2c) is used so that the out- respond strongly to the overlapping regions of the input space,
put range is between 1 and 1. This avoids remapping of the although too large a spread can cause numerical instability (ASCE
outputs. Task Committee, 2000; Govindaraju and Rao, 2000). It is also impor-
tant that the spread of each RBF neuron should not be so large that
Radial basis function network (RBFN) all the neurons respond essentially in the same manner. If the
Both RBFN and BPNN are feed-forward networks, where the pri- spread is too large, the slope of the RBF surface becomes smoother,
mary difference is in the hidden layer and training algorithms. leading to a large area around the input vector. As a result, several
RBFN is a three-layered network (see Fig. 1) and RBF hidden layer neurons may respond to an input vector. On the other hand, if the
units have a receptive field which has a center (see Fig. 2d): that is, spread is too small, the RBF surface becomes steep so that neurons
a particular input value at which they have a maximal output. with the weight closest to the input will have a larger output than
Of the several radial basis functions (RBFs), the most commonly other neurons (ASCE Task Committee, 2000; Govindaraju and Rao,
used is the Gaussian RBF (ASCE Task Committee, 2000; Gov- 2000). When input patterns fall within close proximity to each
indaraju and Rao, 2000; Haddadnia et al., 2003; Chang and Chen, other, the RBF may have overlapping receptive fields. Therefore, it
2003), that is described by is important to determine the optimum value of the spread for an
! RBFN.
kyi cj k2 The weighted sum of the inputs at the output layer is trans-
Rj ¼ exp ; ð5Þ
2r2j formed to the network output using a linear activation function.
The output yk of the RBFN is computed using the following equa-
where cj is the center of the jth RBF neuron, Rj [i.e. uj() for the gen- tion (Haykin, 1999; ASCE Task Committee, 2000; Chang and Chen,
eral Eq. (2)] is the radially symmetric basis activation function, yi is 2003; Haddadnia et al., 2003):
the input vector, k kdenotes a norm that is usually the Euclidean !
distance, and rj is the spread or the radial distance from the center X
N0
yk ¼ uk wjk Rj þ bk ; ð6Þ
of the jth RBF neuron. The standard deviation or spread (rj) of a ra-
j¼1
dial basis neuron determines the width of an area in the input space
to which each neuron responds. The function value Rj is highest at where bk is the bias, wjk is the connection between the hidden neu-
the center, cj, and drops off rapidly to zero as the argument, yi, ron and output neuron, and N0 is the total number of RBFN centers.
moves away from the center. Thus, the neurons in the RBFN have First term is the weighted sum of all inputs. Since each RBFN center
localized receptive fields (Hagan et al., 1996). Fig. 2d shows the must respond to at least one input pattern, N0 is always less than or
width of input space for three r-values; for better vision three equal to the total number of input patterns (N). Therefore, the total
RBF centers are shown. The spread of a radial basis neuron deter- number of RBF centers (N0) is always less than or equal to the total
mines the width of an area in the input space to which each neuron number of input patterns (N). Thus, setting a large value for N0 (i.e.
responds. Thus, the spread should be large enough so that neurons close to N) does not mean that the network will produce a good re-
0.8 0.6
0.6 1 0.2 f (v ) = v
f (v ) =
0.4 1 + e −v
-1 -0.2 0 1
0.2
-0.6
0.0
-6 -3 0 3 6 -1.0
(b) (d)
1.0
Fucntion value, f ( v )
0.6 2
f (v ) = −1
0.2 1 + e − 2v
-6 -3 -0.2 0 3 6
-0.6
-1.0
Input value, v
Fig. 2. Transfer functions: (a) logarithmic sigmoid (0 to 1), (b) hyperbolic tangent sigmoid (1 to 1), (c) linear (1 to + 1), and (d) Gaussian type radial basis function. Shown
are the responses of the RBF to three different values of spread (i.e. standard deviation) r1 = 1.0, r2 = 0.50, and r3 = 0.25 for the inputs value ranging between 0 and 10 at
centers c1 = (7.5, 3), c2 = (3, 6), and c3 = (8.5, 8.5). Shown are the mathematical expressions of activation functions in each subfigure. k k represents the Euclidean distance
between the input value and the RBF center, c.
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 329
sult because the mean square error (MSE) of the predicted and mea- tends to smooth the solution space and averages out some of the
sured values during training may be low because the network might noise effects (ASCE Task Committee, 2000). Moreover, scaling the
be overtrained. data speeds up the training and improves resulting performance
RBF networks are trained by deciding on how many hidden of the derived model. According to Bowden et al. (2002), scaling
units there should be depending on their centers and the sharpness the dataset to this range has two advantages: (1) inputs with much
(standard deviation) of their Gaussians training up the output larger values are prevented from dominating the ANN training pro-
layer. It starts with a minimal network (i.e. one hidden node) cess and (2) penalty constraints can be included more easily (i.e.
and grows during training by adding new hidden nodes one by the maximum and minimum values can be identified by the lGA
one. The training cycle is divided into two phases. First, the net- as 1’s and 0’s). Sahoo and Ray (2006b) demonstrated that ANN
work is trained to minimize the total output error. Second, a new using scaled dataset outperforms ANN using unscaled data. The
node is inserted in the hidden layer and connected to every node training, testing, and validating subsets are scaled to the range of
the input layer and the output layer. After a new node is added, 0–1 using the equation yni = (xi xmin)/(xmax xmin), where xi is
the network is trained and an error is estimated. The addition of the input value, yni is the scaled input value of the input value xi,
new node in the hidden layer is continued until either the sum and xmax and xmin are the respective maximum and minimum val-
of squared error falls beneath a pre-determined value (a value of ues of the unscaled measured data. The network-estimated output
1020 in this study) or N0 equals to N. In this study the number values, which are in the range of 0–1, are converted to real-world
of centers (N0) and standard deviations (rj) are optimized using a values using the equation xi = yni (xmax xmin) + xmin.
lGA model. The output layer weights are then trained using the
Least Mean Square (LMS) method (see Haykin, 1999; Principe Multiple regression analysis (MRA)
et al., 1999; Sahoo and Ray, 2007).
The goal of multiple regression analysis is to evaluate the rela-
Micro-genetic algorithms (lGA) tionship between several independent or predictor variables and a
Genetic algorithms (GAs) are widely used for optimization of dependent or criterion variable. This is done by fitting a straight
water resources variables (e.g. Bowden et al., 2002; Jain and Srini- line to a number of data points. Specifically, a line is produced so
vasulu, 2004; Ines and Honda, 2005; Sahoo and Ray, 2008b). In this that the squared deviations of the observed points from that line
study a binary lGA (chromosomes are made of 0 and 1 digits, are minimized. Thus this procedure is generally referred to as least
hence binary) is applied to optimize the ANN geometry and inter- squares estimation. Mathematically, if stream water temperature
nal parameters. For background on a simple GA, refer to Goldberg (Tw,t) is dependent on the current day air temperature (Ta,t) and so-
(1989). A lGA is similar to a standard GA; however, important dis- lar radiation (QSW,t), and the previous day air temperature (Ta,t1)
tinctions of the lGA are that a small population (i.e. l-population) and solar radiation (QSW,t1) then a multiple regression model is
is used, they restart when the characteristics of the chromosomes
converges to the most fit solution in a generation (i.e. 95% similar T w;t ¼ a0 þ a1 Q SW;t þ a2 T a;t þ a3 Q SW;t1 þ a4 T a;t1 ; ð7Þ
in this study), and a lGA performs no mutation since enough where a0, a1, a2, a3, and a4 are unknown coefficients that are esti-
diversities is introduced after convergence of l-population. The mated by the least squares method. MRA has been the traditional
lGA introduces creep mutation, and during restart, the elite chro- approach utilized in water resources hydrology for several decades
mosome is preserved while the rest of the populations are ran- since the last Century. Some recent applications appears in Chiang
domly generated. Krishnakumar (1989) and Abu-Lebdeh and et al. (2002) and Leclerca and Ouarda (2007).
Benekohal (1999) showed that a lGA reaches the near-optimal re-
gion faster than a simple GA for stationary and non-stationary
Chaos non-linear dynamic method
function optimization. Based on these considerations a lGA was
used in this study. Detailed information on lGA appears in Krish-
Dynamic non-linear chaotic algorithms have proven to be effi-
nakumar (1989), Carroll (1996), Abu-Lebdeh and Benekohal
cient tool in forecasting hydrologic time series that are highly cha-
(1999), Ines and Honda (2005).
otic, non-linear, and vary greatly with the dynamics of the
hydrologic system (Islam and Sivakumar, 2002). Details of dynamic
lGA-ANN model development
non-linear methods are described elsewhere (e.g. Kantz and Schre-
Sahoo and Ray (2008b) developed a lGA-ANN model for opti-
iber, 2004). Briefly, in our study, we adopted the local approxima-
mization of ANN geometry and model parameters using training,
tion approach to forecast the water temperature of day at time t.
validating, and testing datasets. In brief, the lGA model generates
For this the Euclidian distance (EDt1,tl) between the variables
a set of solutions (model geometry and parameters) which is
(i.e. air temperature and solar radiation) at time t1 and those of
passed to ANN. ANN forms the architecture and trains the network
at time t l is estimated, where l represents the lag time (days)
using model parameters and training and validating subsets. ANN
in the time series data. The generic equation for ED of a dataset that
estimates a correlation coefficient (R) value based on a testing sub-
includes delayed time series data is,
set which is unused during training and passes onto lGA. Since,
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
the objective is to reduce the difference between measured and Xm h 2 2 i
ANN-estimated value, maximization of the R-value is used as the EDt1;tl ¼ s¼0
Q SW;t1s Q SW;tls þ T a;t1s T a;tls ;
objective function. With each step, the lGA evolves another set
ð8Þ
of solutions based on the fitness value R. The interchange of infor-
mation between lGA and ANN, named here as lGA-ANN, occurs to where s is delayed time series data, and m is total number of de-
evolve the optimum solution set. layed time series data (i.e. embedding dimensions). Time series data
for s equal to 0 represents the original time series data. For s equal
Data pre-processing for ANN to 1, two additional columns starting with row Ta,t1 and QSW,t1
Before presenting the data to ANN models, the dataset is scaled were added to the original time series data. To match number of
to the range 0–1 so that the different input signals have approxi- rows in two columns of s = 1 time series data with those of s = 0,
mately the same numerical range. The reason for this is that ANN tail-end row of s = 0 time series data were deleted. Similarly, time
models rely on Euclidean measures and unscaled data could intro- series data for higher delayed time (s > 1) were prepared. The m-va-
duce bias or interfere with the training process. Scaling the data lue was determined in ‘Analysis, results and discussion’. Nearest
330 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
neighbors to variables at time t1were searched to estimate mod- Ward Creek, Trout Creek, Third Creek, Logan House Creek, Incline
eling day (t) water temperature. This is done by estimating the EDs Creek, Glenbrook Creek, General Creek, Edgewood Creek, and
of all time series data (i.e. increasing l-value from 1 to t2) with re- Blackwood Creek) are regularly monitored as part of the Lake Ta-
spect to the variables at time t1. It is clear in Eq. (8) that lower hoe Interagency Monitoring Program (LTIMP); they account for
EDt1,tl values (i.e. close to 0.0001 in this study) are nearest neigh- up to 40% of the total annual stream discharge into the lake
bors to variables at time t1. For forecasting purposes, only nearest (Fig. 3). LTIMP became operational in Water Year 1980 (Leonard
neighbor(s) should be included in the estimation. The R-value of and Goldman, 1981). And for many years temperature and water
testing dataset increases as the number of nearest neighbors (i.e. quality data were only available on an event basis with sampling
ED-value) increase; however, R-value starts to decrease as more frequency on the order of 25–30 times per year (Rowe et al.,
uncorrelated events (i.e. higher ED-value) are added into the esti- 2002). However, as part of this long-term monitoring program
mation. For example, to forecast the coolest day water temperature, the US Geological Survey (Carson City, NV) had measured water
the estimation should include only weather data of those days temperature at 10 min intervals in five streams for a shorter, 4–
which have similar weather events, but it must exclude those of 5 year period; the Upper Truckee River (09/18/1997–09/29/2002),
hot days. However, it is not known in priori the optimum ED-value Trout Creek (09/18/1997–09/29/2002), Incline Creek (04/08/
(Opt_ED) for a hydrologic system. So, the Opt_ED-value is searched 1998–09/29/2002), Glenbrook Creek (4/8/1998–9/29/2002) and
iteratively by increasing the ED-value at 0.0001 starting from 0 with Blackwood Creek (5/30/2003–8/9/2003).
R-value as the objective function. Different methods (e.g. polyno- All models need at least two subsets of time series data: train-
mial model (Sivakumar, 2002), regression model, averaging meth- ing (popularly known as calibration) and testing. However, ANN
od, etc.) are employed to forecast the water temperature at day t models need three subsets of data: training, validating and testing.
using the data patterns under the Opt_ED-value (i.e. receptive field). The training and validating subsets are used for the network train-
In this study, water temperature at day t is the mean value of the ing and the testing subset is used to measure the predictive ability
water temperature patterns under the receptive field (i.e. Opt_ED). of the model. The validating subset is used to prevent the ANN
The search process is continued till the ED-value equals to 0.5. High- model being overtrained. The training of ANN is important for it ac-
er Opt_ED value (greater than 0.5) includes most of the time series quires all the information from the presented dataset. Thus, the
data (i.e. macro-structure) and the estimation is more like stochas- training subset should embrace the range of the whole dataset.
tic. On the other hand, for a small Opt_ED value (close to 0.0001), Moreover, it should contain the lowest and maximum values of
estimation includes only correlated events (i.e. micro-structure). the dataset since ANNs are not very good predictors for the new
Because of seasonal weather cycle, similar data patterns repeat dataset whose patterns are far wide from those used during train-
and they are far from those of modeling day t. However, some data ing (ASCE Task Committee, 2000). These subsets should include
patterns are close to those of modeling day t because of smooth information of the modeling domain (i.e. all seasons).
change of weather. The Blackwood Creek time series data does not contain enough
We examined the results in terms of different embedding samples for three subsets. Continuous daily time series data from
dimensions (i.e. lag time, m) and the number of data points closer 1/1/1999 to 9/29/2002 of other four streams (Upper Truckee, Trout,
to the current day water temperature (EDt1) in multi-dimensional Glenbrook, and Incline) were used for the models training, validat-
phase space using correlation coefficient (R) as the objective func- ing and testing, and comparison of their forecasting ability. For
tion. Opt_ED is considered optimum for the case which produces ANN, three subsets: training, validating, and testing were prepared
highest correlation coefficient (R) or yields the lowest value for using the data from 2000 to 2001, 1999, and 2002, respectively
the differences between predicted and actual values using testing (see observed data in Fig. 9). However, the initial 3 years of data
dataset. Searching the Opt_ED-value to the tune of 0.0001 avoids (1/1/1999–12/31/2001) were used for network training for the
the false neighbors in forecasting water temperature at day t. This cases: (1) MRA to estimates the unknown coefficients (a0, a1, a2,
process continues to forecast water temperature of all patterns in a3, etc.) and (2) chaotic data analysis for phase-space reconstruc-
testing subset. tion. Water temperature data for the year 2002 was used as testing
subset to measure the predictive performance efficiency of all
models.
Evaluation of predictive performance efficiency
The predictive performances of all three methods are measured Analysis, results and discussions
using four different statistical efficiency criteria to evaluate the rel-
ative strength and weakness of the various models developed. The first step in applying chaos mathematics is to determine
These are R, mean error (ME), and root of mean square error whether a particular hydrologic system is in fact chaotic. The auto-
(RMSE). Each term is estimated from the predicted and observed correlation function (ACF) and the Fourier power spectrum (FPS)
water temperature (targets). All of these efficiency terms are unbi- are used to obtain some preliminary information regarding the
ased as they use error statistics relative to the observed values. In chaotic nature of hydrologic time series data (Massei et al.,
the present study, mean square error (MSE) was used only to mea- 2006). Comprehensive chaotic characterization is done using the
sure the ANN training error, whereas R, RMSE, and ME were used in concept of phase-space reconstruction, i.e. reconstruction of sin-
the testing phase for all models. Overall, predictions are considered gle-dimensional (or variable) time series to multi-dimensional
more precise if values R, ME, MSE, and RMSE are close to 1, 0, 0, and phase-space to represent its dynamics (Sivakumar, 2002).
0, respectively.
Fig. 3. Lake Tahoe and locations of the all 64 streams. Name and corresponding map ID of 10 LTIMP streams are shown at the left-hand side. The stream temperature analysis
was carried out on streams in bold texts. The stream without a map ID represents Truckee River, the only outflow to Lake Tahoe.
continuous time series data at two different times: t and ts, tor. The value of ACF must lie in the range [1, 1], where 1 indicates
where s is the lag time. Then the autocorrelation (ACF) between perfect correlation and 1 perfect anti-correlation.
these two time series data is The autocorrelation function fluctuates about zero for a purely
random process, indicating that the process at any certain instance
E ðX t X t ÞðX ts X ts Þ has no ‘‘memory’’ of the past at all. Islam and Sivakumar (2002)
ACFðt; t sÞ ¼ ; ð9Þ
rt rts demonstrated that, the autocorrelation function is periodic for a
where X t and X ts are means; and r2t and r2ts are variances of time periodic process, indicating the strong relation between values that
series data Xt and Xts, respectively. E is the expected value opera- repeat over and over again. The autocorrelation function signal
332 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
from a chaotic process decays exponentially with increasing lag, relation in the micro-structures (i.e. due to extreme events such as
because the states of a chaotic process are neither completely storms and periods of unusually high or low air temperature). This
dependent (i.e. deterministic) nor completely independent (i.e. mixture of power spectrum represents the chaotic dynamics of
random) of each other. water temperature. This means, chaotic algorithms might be useful
Fig. 4 presents the autocorrelation function for the daily water tool that forecast water temperature using only data patterns of
temperature of the four study streams. In each case, the ACF sig- similar weather patterns.
nals decay up to a time lag of about 185 days similar to seasonal Data pre-processing techniques are a powerful means of pre-
cycle. This means the effect of any specific event in time series structuring the problem setting of function approximation through
data on subsequent measurements is completely lost after an adaptive training procedure. In particular, integral transforms
185 days. Massei et al. (2006) reported that most authors referred may change the nature of the training problem significantly with-
to the time lag that corresponds to an autocorrelation function out loss of generality if carefully selected. This provides an excel-
equals to 0.2 to compare memory effects, but in fact any value lent opportunity to incorporate additional knowledge about the
of the autocorrelation function, within reason, could be chosen. Is- process in the training dataset that closely represents the inherent
lam and Sivakumar (2002) estimated a similar value (200 days) physical processes of the system. For example, stream temperature
for the stream runoff dynamics characterization using ACF. How- does not change instantaneously to the change of current day air
ever, they found an optimum lag time equal to 3-day employing temperature. Although, current day heat inputs are significant con-
the correlation integral analysis, the false nearest neighbor algo- tributor for the change in water temperature, the previous days’
rithm, and the non-linear prediction method for incorporating heat inputs are also carried over to current day temperature to
the embedding dimension. Thus, 185 days lag may not be the some extent. However, the number of days whose heat inputs
optimum value for stream temperature prediction. We propose influence on the current day stream temperature is not known a
to examine the lag time using mutual information (MI), visual priori. Massei et al. (2006) and Islam and Sivakumar (2002) find-
observation of attractors (i.e. phase-space diagrams), and non-lin- ings showed that a large lag time estimated by ACF corresponding
ear prediction method. However, the important characterizations to autocorrelation coefficient of 0.2 could be misleading, because it
of stream temperature using ACF are: (1) the exponential initial implies that the model needs to include information of the large
decay of autocorrelation coefficient is an indication of the pres- lag time (185 days in this study). Thus, data analysis was carried
ence of chaotic dynamics in stream temperature and (2) ACF out using mutual information and phase-space reconstruction to
exhibits the same kind of seasonal variation as that of the original estimate the approximate lag time as proposed by Wang et al.
time series stream temperature data (Fig. 10) indicating strong (2006).
relation between seasonal data patterns.
FPS describes how the variance of a time series is distributed
Mutual information and phase-space reconstruction
with frequency. FPS is a useful tool because it reveals periodicities
in input data as well as the relative strengths of any periodic com-
The phase-space reconstruction of the time series data is a use-
ponents. Fig. 5 shows FPS values for frequency periods up to
ful tool for characterizing the dynamics of a system (Sivakumar,
400 days. The 400 days frequency period was selected to clearly
2002). In essence it is a graph whose coordinates are all variables
show the mixture of low and high amplitudes in the figure. Fig. 5
that describe the state of the system at any moment. The trajecto-
shows a large number of periodicities. This is understandable as
ries of the phase-space diagram, known as an attractor, describes
the stream water temperature is influenced by natural changes
the evolution of the system from the initial point, hence represent
in weather conditions. As can be seen, all streams exhibit a combi-
the history of the system.
nation of slow and fast oscillations with different amplitude. The
The method of delay is most commonly used in phase-space
peaks are due to the sudden changes in weather condition because
reconstruction. Since the time series are non-linear and often
of either storm events or large changes in air temperature. The
non-Gaussian, methods that look beyond linear dependence are
slow oscillation (i.e. low amplitude for several days) in the fre-
needed to choose a delay time. The generic time-delayed versions
quency domain implies that there is a high correlation between
(e.g. Takens, 1981) of one scalar measurements Xt (t = 1, 2, . . . , N)
the large-scale components of the signal in time (macro-structures
for m dimensional vector is
due to natural weather cycle); while a very strong and fast oscilla-
tion (i.e. high amplitude for only a day to a few days) implies cor- Y t ¼ ðX t ; X ts ; X ðt2sÞ ; . . . ; X ðtðm1ÞsÞ Þ; ð10Þ
1.0
Upper Truckee
Autocorrelation Coefficient
0.8 Trout
Glennbrook
0.6 Incline
0.4
0.2
0.0
0 200 400 600 800 1000
Lag time (day)
Fig. 4. Autocorrelation functions of daily stream water temperature of the four streams.
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 333
2.0 2.0
Log (Power)
1.0 1.0
0.0 0.0
-1.0 -1.0
-2.0 -2.0
0 100 200 300 400 0 100 200 300 400
Frequency (day) Frequency (day)
2.0 2.0
Log (Power)
Log (Power)
1.0 1.0
0.0 0.0
-1.0 -1.0
-2.0 -2.0
0 100 200 300 400 0 100 200 300 400
Frequency (day) Frequency (day)
where t = 1, 2, . . . , N (m 1)s/Dt, m is the embedding dimension, Fig. 6 presents the variation of MI against the lag time. The MI
s is the delay time, Dt is the sampling frequency. In the non-linear function exhibits the initial rapid decay up to a lag time of around
dynamic system, the reasonable choice of m and s is very crucial 2–4 days before attaining the near saturation. It is clear that the
(Kim et al., 1998, 1999). Kantz and Schreiber (2004, p. 148) and selection of the minimum MI is difficult, as except the initial decay,
Kim et al. (1998, 1999) reported that there is no mathematical the function appears decreasing very slowly as the lag time in-
framework to determine the optimal s value. Thus, mutual informa- creases. As can be seen in Fig. 6, it is difficult to select the satura-
tion (MI) values as suggested by Kantz and Schreiber (2004, p. 38) tion point. Thus, the appropriateness of selecting the initial rapid
are estimated for different lag time to estimate the approximate decay as the s value and its superiority over other values is verified
embedding dimensions. For a discrete time series, with Xt and Xts, looking at the phase space plots reconstructed in two dimensions.
as successive values, the mutual information (Hejazi et al., 2008) is Phase-space diagrams are plotted in two coordinate systems in
computed as Figs. 7 and 8. Plots in Figs. 7a and 8a, without a lag time, yield a
XX well-defined attractor (i.e. all the data are close to the diagonal
pðX t ; X ts Þ
MIðX t ; X ts Þ ¼ pðX t ; X ts Þ log ; ð11Þ since values of consecutive vectors are similar) whereas Figs. 7f
ts
pðX t ÞpðX ts Þ
t and 8e, with a 10-day lag time, result in independent reconstruc-
where MI is the mutual information of two discrete variables Xt tion (no attractor or less attractor). This is evident as the attractors
and Xts, p(Xt, Xts) is the joint probability density of Xt and Xts, of Figs. 7f and 8e are less dense for the temperature range 10–
and p(Xt) and p(Xts) are the individual probability distribution of 13 °C. In spite of the observation of well-defined attractor with
Xt and Xts, respectively. The MI measures the information that Xt no time lag, it should be noted that the obtained phase-space
and Xts share: it reflects how much knowing one of these vari- may be redundant because it does not fully represent the inherent
ables reduces uncertainty about the other. MI ranges between 0 complexity of the stream system excluding previous days’ temper-
and 1. If Xt and Xts are independent, then knowing Xt does not ature effects on current day water temperature. Kantz and Schrei-
give any information about Xts and vice versa, so MI(Xt, Xts) ber (2004, p. 38) reported that if s is small compared to the internal
equals to zero. At the other extreme, if Xt and Xts are dependent, time scales of the system, successive elements of the delayed vec-
then all information conveyed by Xt is shared with Xts: knowing Xt tors are strongly correlated and if s is very large, successive ele-
determines the value of Xts and vice versa. In such a case the va- ments are already almost independent. Based on attractor clarity,
lue of p(Xt, Xts) becomes greater than the product of two individ- the system nature and complexity is categorized as: (1) a very clear
ual probability distribution functions (p(Xt) and p(Xts)). Liaw et al. attractor represents a simple dynamics and needs low-dimensional
(2001) reported that the first minimum MI between two measure- system, (2) a highly scattered attractor represents a complex
ments Xt and Xts is selected to be the time lag for forecasting dynamics and needs a high-dimensional system, and (3) the inter-
purposes. mediate clarity represents an intermediate complexity and needs a
334 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
6
Upper Truckee
5 Trout
Mutual information
Glennbrook
4
Incline
0
0 10 20 30 40 50 60 70 80 90 100
Lag time (day)
Fig. 6. Mutual information for daily stream water temperature for the four streams. Except the Upper Truckee River above 60-day time lag, lines of all four streams overlap
each other.
(a) 25
(d) 25
20 20
15 15
10 10
Ta,t-3
Ta,t
5 5
0 0
-5 -5
-10 -10
-15 -15
-15 -5 5 15 25 -15 -5 5 15 25
Tw,t Tw,t
(b) 25 (e) 25
20 20
15 15
10 10
Ta,t-4
Ta,t-1
5 5
0 0
-5 -5
-10 -10
-15 -15
-15 -5 5 15 25 -15 -10 -5 0 5 10 15 20 25
Tw,t Tw,t
(f) 25
(c) 25
20
20
15
15
10
10
Ta,t-10
Ta,t-2
5 5
0 0
-5 -5
-10 -10
-15 -15
-15 -5 5 15 25 -15 -10 -5 0 5 10 15 20 25
Tw,t Tw,t
Fig. 7. Phase-space reconstruction for water temperature (°C), Tw, to air temperature (°C), Ta of Upper Truckee River. Subscript t represents the value of the current day and
t s represents the day s-day behind the current day.
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 335
(a) 25 (d) 25
20 20
15 15
10 10
Ta,t-4
Ta,t-1
5 5
0 0
-5 -5
-10 -10
-15 -15
-15 -5 5 15 25 -15 -10 -5 0 5 10 15 20 25
Ta,t Ta,t
(e) 25
(b) 25
20 20
15 15
10 10
Ta,t-10
Ta,t-2
5 5
0 0
-5 -5
-10 -10
-15 -15
-15 -5 5 15 25 -15 -10 -5 0 5 10 15 20 25
Ta,t Ta,t
(c) 25
20
15
10
Ta,t-3
5
0
-5
-10
-15
-15 -5 5 15 25
Ta,t
Fig. 8. Phase-space reconstruction air temperature (°C), Ta. Subscript t represents the value of the current day and t s represents the day s-day behind the current day.
medium-dimensional system. Visual observation of Figs. 7 and 8 summary of the prediction results in terms of the R, ME, and RMSE.
indicate that the delayed vectors for time lag greater than 4-day R and RMSE are good indicators to compare results between differ-
yield weak correlation with the original vector (i.e. with no time ent scenarios because deviations of model-estimated values from
lag). Liaw et al. (2001) pointed out that if s is large and the attrac- the measured data points are shown by these two parameters. It
tor is chaotic, any small noise will be amplified. In addition, infor- also shows input variables for different time lag (i.e. embedding
mation is lost in mapping, and the vectors become irrelevant with dimensions). As can be seen the optimum prediction results are
respect to each other for very large value of s. Thus, based on the achieved for time lag equal to 1-day. Values of R decreases and
visual observation of phase-space reconstruction plots (i.e. attrac- RMSE values increase as time lags increase greater than 1-day. An-
tor) and information of initial rapid decays of MI function, the high- other important feature is that the optimum receptive field ex-
est embedding dimension value for modeling purpose was capped pressed in terms of euclidian distance (Opt_ED) varies from
at 4. Moreover, since the system inherent characteristic is not stream to stream. Opt_ED is used to measure the nearest neighbors
known a priori, all the models were examined using inputs of those are effective to estimate the stream temperature of testing
low dimension vector to a high dimension vector (i.e. s = 0–4) with subset for a specific embedding dimension. A high Opt_ED (close
R as the objective function. to 1) suggests a stochastic condition (i.e. a macro-structure that in-
cludes all sampling events), whereas a low Opt_ED (close to
Sensitivity analysis of embedding dimensions for the Chaos non-linear 0.0001) represents that the system is non-linearly chaotic (i.e. mi-
dynamic prediction method cro-structures those include only correlated events). It is seen for
all cases that the four stream systems are found to be non-linear
For the chaos non-linear dynamic prediction method, the first chaotic. Although the high Opt_ED value (0.4998) in case of Trout
3 years of data (1999–2001) was used for the training or learning Creek at 0-day lag time shows towards stochastic nature; how-
subset; the additional 1 year of data (2002) was used for predict- ever, the overall forecasting performance criteria in this case are
ing and measuring the prediction efficiency. Table 1 presents the not optimum.
336 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
(b) 1.00
(a) 0.995
0.990 0.98
0.985
0.96
Upper Truckee
R
0.980
R
0.94 Trout
0.975
Incline
0.92
0.970 Glenbrook
0.965 0.90
0 1 2 3 4 5 6 0 1 2 3 4 5 6 7
Lag time (days) Lag time (days)
0.8 2.8
(c) (d)
2.5
0.4
RMSE (oC) 2.2
ME (oC)
0.0 1.9
1.6
-0.4
1.3
-0.8 1.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Lag time (days) Lag time (days)
Fig. 9. Prediction performance efficiency of streams: (a) R-values using lGA-4BPNN, (b) R-values using multiple regression analysis (MRA), (c) ME using MRA, and (d) RMSE
using MRA.
Sensitivity analysis of activation function using lGA-BPNN model respectively. Results of three models: lGA-3BPNN, lGA-4BPNN,
and lGA-RBFN are compared in Table 3. All three models predict
The two most commonly used transfer functions (see Fig. 2a and stream temperature with R greater than 0.96 except Trout Creek
b) used in BPNN hidden layer(s) were examined for different for 1-day time lag using lGA-RBFN model. The overall prediction
embedding dimensions. Table 2 shows that the three-layered BPNN performance efficiency in terms of R, ME, and RMSE of lGA-4BPNN
(referred here as 3BPNN) with hyperbolic tangent sigmoid function is higher than other two models for time lag less or equal to 3-day.
(Fig. 2b) outperformed the 3BPNN with the logarithmic sigmoid The overall prediction performance efficiencies of lGA-3BPNN and
transfer function (Fig. 2a) for the case of without a time lag. Note lGA-4BPNN for time lag equal to 4-day are found to be lower com-
that 3BPNN is optimized by lGA, referred here as lGA-3BPNN. pared to those of time lag equal to 3-day. However, the overall pre-
However, a closer examination of the overall results for all four time diction performance efficiency of lGA-RBFN for time lag equal to
lags considered in this study shows that both activation function 4-day is better than those of all others. The R-values of lGA-4BPNN
predictions are close to each other in terms of R, ME, and RMSE. and lGA-3BPNN are very close to each other while the R-values of
However, overall the optimum geometry of BPNN using logarithmic lGA-RBFN are lowest compared to other two models except Trout
sigmoid transfer function is higher than the BPNN using hyperbolic Creek. To view performance efficiency for lag time greater than 4-
tangent sigmoid function. ANN with higher geometry takes more day, lGA-4BPNN was examined because it produced highest per-
computing time. This is consistent with the findings of Ray and formance efficiency among all three models considered in this
Klindworth (2000). Thus, a hyperbolic tangent sigmoid and a linear study. Results were presented in Fig. 9a for comparison with those
(Fig. 2c) transfer function are used for the nodes of hidden and out- of multiple regression analysis. Fig. 9a showed that R-value does
put layer(s), respectively, in the later analysis. not significantly improve after time lag equals to 3-day.
Sensitivity analysis of embedding dimensions for lGA-3BPNN, lGA- Sensitivity analysis of embedding dimensions for the multiple
4BPNN, and lGA-RBFN models regression analysis (MRA) prediction method
The four-layered BPNN (referred as 4BPNN) and RBFN are opti- Fig. 9b and c shows the effect of embedding dimensions on the
mized using lGA and referred here as lGA-4BPNN and lGA-RBFN, prediction efficiency using MRA. The prediction performance effi-
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 337
(a) 24 (b) 20
Meausred Test
20
Train Validation
Water Temperature ( C)
Water Temperature ( C)
16
o
o
16
12 12
8 8
4
4
0 Meausred Test
Train Validation
-4 0
2/9/99
7/24/98
8/28/99
3/15/00
10/1/00
4/19/01
11/5/01
5/24/02
2/9/99
12/10/02
7/24/98
8/28/99
3/15/00
10/1/00
4/19/01
11/5/01
5/24/02
12/10/02
16 (d) 20
(c)
16
12
Water Temperature ( C)
Water Temperature ( o C)
o
12
8
8
4
4
0
0 Meausred Test
Meausred Test
Train Validation Train Validation
-4 -4
2/9/99
7/24/98
8/28/99
3/15/00
10/1/00
11/5/01
4/19/01
5/24/02
12/10/02
2/9/99
7/24/98
8/28/99
3/15/00
10/1/00
4/19/01
11/5/01
5/24/02
12/10/02
Fig. 10. Comparison of lGA-4BPNN estimated training, validation and testing dataset with observed dataset for: (a) Upper Truckee, (b) Trout Creek, (c) Incline Creek, and (d)
Glenbrook Creek. The performance efficiency and scatter plots above plots are shown in Fig. 11.
Table 1
Performance efficiency of dynamic chaos non-linear method for the four phase space construction. Ta,t and QSW,t represent 0-day delayed time series data. Similarly, Ta,t4 and
QSW,t4 represent 4-day delayed time series data. Ta,t, . . . , QSW,t4 time series data includes 0- to 4-day delayed time series data (i.e. five imbedding dimensions).
Table 2
Comparison of performances of lGA-3BPNN using hyperbolic tangent sigmoid (tansig) and logarithmic sigmoid (logsig) transfer function.
Lag time (days) Stream Network geometry Prediction performance efficiency on test dataset
Tansig Logsig R RMSE ME
Tansig Logsig Tansig Logsig Tansig Logsig
0 Upper Truckee 11 11 0.9640 0.9639 1.9127 1.9299 0.5580 0.6253
Trout 3 3 0.9777 0.9766 2.5686 2.1200 1.4162 0.0831
Glennbrook 5 5 0.9812 0.9808 1.2847 1.0886 0.6843 0.3786
Incline 6 5 0.9804 0.9804 1.1140 1.0767 0.4785 0.4043
1 Upper Truckee 12 15 0.9710 0.9721 1.7650 1.7478 0.636 0.7012
Trout 2 3 0.9823 0.9814 1.8860 3.4307 0.2433 2.0242
Glennbrook 5 6 0.9877 0.9886 2.1622 1.0744 1.6259 0.5022
Incline 9 5 0.9877 0.9872 1.1802 1.2630 0.4681 0.5914
2 Upper Truckee 11 12 0.9754 0.9745 1.5877 1.5967 0.5164 0.4567
Trout 4 2 0.9796 0.9843 1.7648 2.4802 0.1117 1.0321
Glennbrook 4 5 0.9902 0.9900 1.2825 0.9216 0.2201 0.4089
Incline 5 4 0.9892 0.9900 1.6655 0.9159 1.3304 0.5493
3 Upper Truckee 14 14 0.9796 0.9794 1.4175 1.5343 0.2561 0.2425
Trout 2 2 0.9805 0.9833 2.1737 2.5876 0.2861 0.6754
Glennbrook 5 6 0.9913 0.9914 1.6137 1.0012 1.2722 0.5499
Incline 4 6 0.9905 0.9907 0.8176 0.9213 0.3990 0.4698
4 Upper Truckee 13 13 0.9807 0.9814 1.3962 1.3915 0.4002 0.4378
Trout 5 8 0.9734 0.9790 2.4946 2.1665 1.1589 0.2499
Glennbrook 6 5 0.9910 0.9914 1.0549 1.0311 0.4458 0.2097
Incline 6 12 0.9900 0.9902 0.9530 1.3826 0.4968 1.0162
Table 3
Comparative analysis of predictive performance efficiency of lGA-3BPNN, lGA-4BPNN, and lGA-RBFN.
ciency in terms of R increases for all cases except for Trout Creek. In 4BPNN, lGA-3BPNN, lGA-RBFN, and chaotic non-linear
the case of Trout Creek, the prediction performance efficiency in dynamics).
terms of R is highest at the 1-day time lag then decreased.
Fig. 9b also shows that the incremental increase of R-value de- Sensitivity analysis of input parameters (air temperature and
clined with increasing lag time after 3-day time lag. MRA is a sta- shortwave radiation)
tistical method which estimates stream temperature based on the
overall statistics of the training dataset. It is unable to address the Sensitivity analysis on the two input variables (air temperature
dynamic non-linearity of the system (i.e. correlated events based and short wave radiation) influencing stream water temperature
estimation). Thus, similar to the trend of MI, the more the informa- was carried out by deleting one input variable from the input data
tion (i.e. memory effect) is incorporated into the training dataset, set to measure the importance of that variable over other. Since
R-value increases. Its prediction performance efficiency with lGA-4BPNN produced the highest predictive performance effi-
embedding dimensions of 4-day time lag was found to be much ciency, it was used to carry out the sensitivity analysis. The results
less than those of other two methods (i.e. four models: lGA- presented in Table 4 shows that the predictive performance effi-
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 339
ciency using only shortwave radiation was very low. The RMSE val- into stream water are transferred to the current time step is not
ues of lGA-4BPNN using only shortwave radiation as input are known a priori. Therefore, successive delayed input time series
nearly double than those of using air temperature as input. The data are introduced in the dataset to represent the inherent system
predictive performance efficiency of the 1-day time lag increased dynamics using empirical models. The dynamics of each stream is
substantially than those of the case of no time lag. In all cases different and every methodology has different capacity to handle
the R-values were found to be increasing as time lag (days) in- the inherent dynamics. Prediction performance of lGA-4BPNN
creased. However, R-values using only air temperature were found was found to be different than those of MRA, lGA-RBFN, and chaos
to be above 0.96 in all cases for time lag equal to 1-day or higher. algorithms. Overall, prediction performance of lGA-4BPNN in
The overall predictive performance efficiency of all streams was terms of R, ME, and RMSE is optimum for time lag equal to 3-
found to be optimum for time lag equal to 3-day. The finding that day. This is close to the lag time (days) reached by MI following
air temperature alone had significant effect on stream water tem- its rapid and dramatic decline (see Fig. 6). Results of all the models
perature is similar to the findings of other investigators (e.g. Mac- showed that the rate of increase in R declined as time lag (days) in-
key and Berrie, 1991; Mohseni and Stefan, 1999). The reason for air creased (Table 3 and Fig. 9). This indicates that the modeling cur-
temperature being the primary factor in stream temperature esti- rent day heat input had a significant effect on stream water
mation is the tendency of stream water temperature to attain the temperature, but that the effect of heat input of subsequent days
equilibrium temperature. Comparing values of Table 4 with those in recent past rapidly decreased. Results of lGA-4BPNN and chaos
of Table 3 of lGA-4BPNN, predictive performance efficiency is method show that incorporating information of higher lag time
higher for input dataset containing both air temperature and short- (greater than 3-day) result in overall loss of effect of heat input
wave radiation. on stream water temperature estimation. This may be due to the
inputs of mixed information of extreme (storms or dramatic
General discussion changes in air temperature) and the natural cycle of weather. Over-
all, lGA-4BPNN outperformed MRA, lGA-RBFN, and chaotic non-
The prediction performance efficiency of the MRA and the lGA- linear dynamic method in terms of R, ME, and RMSE.
RBFN approaches, when expressed in terms of R, ME, and RMSE, The results of lGA-4BPNN for time lag equal to 3-day are com-
were found to be similar trend (that is R increases as time lag in- pared with measured stream water temperature. ANN-estimated
creases) except for Trout Creek. However, the prediction results ob- training, validating, and testing values are compared with those
tained using chaos theory were different. Since the Opt_ED values of measured values in Fig. 10. Agreement between the measured
are close to 0.1 and prediction performance efficiency is optimum and predicted stream water temperature were excellent, clustering
for time lag equal to 1-day, it appears that time series data are cha- around the 1:1 line (Fig. 11). As is seen in Figs. 10 and 11, the pre-
otic and the dynamic non-linear chaos method could be a viable dicted values are in good agreement with the measured values ex-
alternative for stream temperature estimation. However, MRA cept the summer peaks where deviations are in the range of 1–4 °C
and lGA-RBFN results show that these models rely on past statis- for a few days and only Trout Creek. This difference is attributed to
tics (information of all 4 days) of the stream system to achieve the the unavailability of an adequate long time series data. Although
optimum prediction performance efficacy except the Trout Creek. the training subset includes two summers with peak values, the
The general concept is that water temperature does not change validating subset lacks such data (see Fig. 10). Consequently, the
instantaneously with a change in air temperature. Effects of previ- validating subset for Trout Creek could not help to prevent over-
ous days’ heat inputs into water were carried to current day. Since or under-training the network. The temperatures during winter
stream flow travel time (headwaters to the inlet to the lake) of all are in good agreement with difference less than 0.5 °C. Thus,
four streams is nearly 2 days, applying a time lag of nearly 2 days lGA-BPNN4 is found to be a better tool for prediction purposes
was justified. But what portion of the previous days’ heat inputs when compared to other three tools considered in this study.
Table 4
Comparative analysis of predictive performance efficiency of lGA-4BPNN using only air temperature as input and only shortwave radiation as input.
20
16
12
12
8
8
4
4 R = 0.980
R = 0.983
ME =0.452
ME =-0.349 0
0 RMSE =1.538
RMSE =1.322
-4 -4
-4 0 4 8 12 16 20 24 -4 0 4 8 12 16 20
(c) (d) 20
16
12
12
8
8
4 R = 0.990 R = 0.991
ME =0.486 4 ME =0.281
RMSE =0.855 RMSE =0.792
0 0
0 4 8 12 16 0 4 8 12 16 20
Conclusions time lag (greater than 3-day) appears to be good choice to obtain
higher prediction efficiency using lGA-RBFN and MRA. The predic-
Exponential decay of the autocorrelation coefficient function tion performances were found to decrease when the time lag in-
and the mixture of continuous broad band and peaks observed in creased. This is similar to the results showing that MI decreased
Fourier power spectrum analysis indicate the presence of chaos to near minimum in 3-day lag time and continues to decrease
in the dataset. This is also reflected in the results predicted by slowly for a large lag time.
chaos algorithms. The optimum Euclidian distances to search the Overall, the prediction performance efficiency of lGA-4BPNN
optimum neighbors for prediction purposes were found to be was found to be the highest among all algorithms. Comparing
non-linear chaotic. the predicted values of lGA-4BPNN with corresponding measured
Sensitivity analysis was performed prior to presenting conclu- values it was found that predicted values from the lGA-4BPNN
sive inferences from the results of models above. Phase-space analysis were in better agreement with measured values. A sensi-
reconstruction plots shows that a limited time delay (less than 3- tivity analysis showed that air temperature was found to be the
day) produces better correlation than higher time delay. Both air most important parameter in stream temperature estimation;
and water temperature data are highly dependent when there is however, the inclusion of short wave radiation did enhance perfor-
no time lag and more independent (i.e. uncorrelated) when time mance efficiency somewhat. Shortwave radiation alone could not
lag is increased. This is supported by the analysis of mutual infor- predict stream temperature with any degree of acceptable
mation function which suggests that the correlation coefficient re- accuracy.
duces to minimum within 2–4 days. The chaotic non-linear
dynamic algorithms show that the prediction performance is opti-
Acknowledgements
mum when data are presented to the model with only 1-day time
lag. Results of the lGA-4BPNN show that the prediction perfor-
This work was partially supported by Grants 01-174-160-0 and
mance is optimum for time lag equal to 3-day; however, a higher
01-175-160-0 from the Lahontan Regional Water Quality Control
G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342 341
Board. We thank two anonymous reviewers and associate editor Caissie, D., Satish, M.G., El-Jabi, N., 2007. Predicting water temperatures using a
deterministic model: application on Miramichi River catchments (New
Dr. J. D. Salas for helpful comments and suggestions for improve-
Brunswick, Canada). Journal of Hydrology 336, 303–315.
ment of the paper. Carroll, D.L., 1996. Genetic algorithms and optimizing chemical oxygen–iodine
lasers. In: Wilson, H., Batra, R., Bert, C., Davis, A., Schapery, R., Stewart, D.,
Swinson, F. (Eds.), Developments in Theoretical and Applied Mechanics, vol. 18.
Appendix School of Engineering, University of Alabama, pp. 411–424.
Chang, F., Chen, Y., 2003. Estuary water-stage forecasting by using radial basis
function neural network. Journal of Hydrology 270, 158–166.
Levenberg–Marquardt. algorithm Chen, H., Kim, A.S., 2006. Prediction of permeate flux decline in crossflow
membrane filtration of colloidal suspension: a radial basis function neural
The Levenberg–Marquardt algorithm, an approximation to network approach. Desalination 192, 415–428.
Chiang, S., Tsay, T., Nix, S.J., 2002. Hydrologic regionalization of watersheds. II:
Newton’s method (Marquardt, 1963), is applications. Journal of Water Resources Planning and Management 128 (1),
12–20. doi:10.1061/ASCE!0733-94962002!128:112.
Dw ¼ ½r2 FðwÞ1 rFðwÞ; ðA1Þ Cigizoglu, H.K., Kisi, O., 2005. Flow prediction by three back propagation techniques
using k-fold partitioning of neural network training data. Nordic Hydrology 36
where r2F(w) is the Hessian matrix and rF(w) is the gradient. (1), 49–64.
r2F(w) and rF(w) can be shown as Dallimore, C.J., Imberger, J., Hodges, B.R., 2004. Modeling a plunging underflow.
Journal of Hydraulic Engineering 130 (11), 1068–1076.
Edinger, J.E., Duttweiler, D.W., Geyer, J.C., 1968. The response of water temperatures
rFðwÞ ¼ JT ðwÞeðwÞ; ðA2Þ to meteorological conditions. Water Resources Research 4 (1), 1137–1143.
El-Bakyr, M.Y., 2003. Feed forward neural networks modeling for K–P interactions.
r2 FðwÞ ¼ JT ðwÞJðwÞ þ SðwÞ; ðA3Þ Chaos Solitons and Fractals 18 (5), 995–1000.
Fausett, L., 1994. Fundamentals of Neural Networks. Prentice-Hall, Englewood Cliffs,
NJ.
where J(w) is a Jacobian matrix and Fleenor, W.E., 2001. Effects and Control of Plunging Inflows on Reservoir
Hydrodynamics and Downstream Releases. PhD dissertation, UC Davis, CA, USA.
X
N
Goldberg, D.E., 1989. Genetic Algorithms in Search, Optimization, and Machine
SðwÞ ¼ en r2 en ðwÞ: ðA4Þ Learning. Addison-Wesley-Longman, Reading, Mass, USA.
n¼1 Govindaraju, R.S., Rao, A.R., 2000. Artificial Neural Networks in Hydrology. Kluwer
Academic Publishers, Dordecht.
For the Gauss–Newton method it is assumed that S(w) 0, and Haddadnia, J., Faez, K., Ahmadi, M., 2003. A fuzzy hybrid learning algorithm for
equation (A1) becomes radial basis function neural network with application in human face
recognition. Pattern Recognition 36, 1187–1202.
Dw ¼ ½JT ðwÞJðwÞ1 JT ðwÞeðwÞ: ðA5Þ Hagan, M.T., Menhaj, M.B., 1994. Training feed forward techniques with the
Marquardt algorithm. IEEE Transactions on Neural Networks 5 (6), 989–993.
The Levenberg–Marquardt modification to the Gauss–Newton Hagan, M.T., Demuth, H.P., Beale, M., 1996. Neural Network Design. PWS Publishing,
Boston.
method is
Haykin, S., 1999. Neural Networks: A Comprehensive Foundation. Macmillan, New
York.
Dw ¼ ½JT ðwÞJðwÞ þ fI1 JT ðwÞeðwÞ; ðA6Þ Hejazi, M.I., Cai, X., Ruddell, B.L., 2008. The role of hydrologic information in
reservoir operation – learning from historical releases. Advances in Water
where I is the unit matrix and f is a scalar value. Eq. (A6) can be Resources 31, 1636–1650.
written as [Hagan et al., 1996; Haykin, 1999; Principe et al., 1999] Ines, A.V.M., Honda, K., 2005. On quantifying agricultural and water management
practices from low spatial resolution RS data using genetic algorithms: a
wðc þ 1Þ ¼ wðtÞ ½J T fwðcÞg JfwðcÞg þ fI1 J T fwðcÞgefwðcÞg; ðA7Þ numerical study fro mixed-pixel environment. Advances in Water Resources 28,
856–870.
Islam, M.N., Sivakumar, B., 2002. Characterization and prediction of runoff
where w(c) is the weight matrix of current iteration c and c + 1 is dynamics: a nonlinear dynamical view. Advances in Water Resources 25,
the next iteration. 179–190.
When the scalar f is zero, Eq. (A7) is just the Gauss–Newton’s Jain, A., Srinivasulu, S., 2004. Development of effective and efficient rainfall–runoff
models using integration of deterministic, real-coded genetic algorithms and
method; on the other hand when f is large, Eq. (A7) becomes the
artificial neural network techniques. Water Resources Research 40, W04302.
gradient descent (Haykin, 1999) with step size 1/f. The Gauss– doi:10.1029/2003WR002355.
Newton’s method is faster and more accurate than the gradient Jingyi, Z., Hall, M.J., 2004. Regional flood frequency analysis for the Gan-Ming River
Basin in China. Journal of Hydrology 296, 98–117.
descent near an error minimum, so the aim is to shift toward
Kantz, H., Schreiber, T., 2004. Nonlinear Time Series Analysis, Second ed. Cambridge
Gauss–Newton’s method as quickly as possible (Kisi, 2004; Cigizo- University Press.
glu and Kisi, 2005). The steepest-descent method, on the other Kim, H.S., Eykholt, R., Salas, J.D., 1998. Delay time window and plateau onset of the
hand, has a slow asymptotic convergence rate. correlation dimension for small data sets. Physica Review E 58 (5), 5676–5682.
Kim, H.S., Eykholt, R., Salas, J.D., 1999. Nonlinear dynamics, delay times, and
embedding windows. Physica D Nonlinear Phenomena, Physica D 127, 48–60.
References Kinouchi, T., Yagi, H., Miyamoto, M., 2007. Increase in stream temperature related to
anthropogenic heat input from urban wastewater. Journal of Hydrology 335,
78–88.
Abu-Lebdeh, G., Benekohal, R.F., 1999. Convergence variability and population
Kisi, Ö., 2004. Multi-layer perceptrons with Levenberg–Marquardt training
sizing in micro-genetic algorithms. Computer-Aided Civil and Infrastructure
algorithm for suspended sediment concentration prediction and estimation.
Engineering 14 (5), 321–334.
Hydrological Sciences Journal 49 (6), 1025–1040.
Alp, M., Cigizoglu, H.K., 2007. Suspended sediment load simulation by two artificial
Krasjewski, W.F., Kraszewski, A.K., Grenney, W.J., 1982. A graphical technique for
neural network methods using hydrometeorological data. Environmental
river water temperature predictions. Ecological Modelling 17, 209–224.
Modelling and Software 22 (1), 2–13.
Krishnakumar, K., 1989. Micro-genetic algorithms for stationary and non-stationary
ASCE Task Committee, 2000. Artificial neural network in hydrology. Journal of
function optimization. SPIE: Intelligent Control and Adaptive Systems 1196,
Hydrologic Engineering 5 (2), 124–144.
289–296.
Benyahya, L., St-Hilaire, A., Ouarda, T.B.M.J., Bobée, B., Ahmadi-Nedushan, B., 2007.
Leclerca, M., Ouarda, T.B.M.J., 2007. Non-stationary regional flood frequency
Modeling of water temperatures based on stochastic approaches: case study of
analysis at ungauged sites. Journal of Hydrology 343, 254–265. doi:10.1016/
the Deschutes River. Journal of Environmental Engineering and Science 6, 437–
j.jhydrol.2007.06.021.
448. doi:10.1139/S06-067.
Leonard, R.L., Goldman, C.R., 1981. Interagency Tahoe Monitoring Program: First
Birikundavyi, S., Labib, R., Trung, H.T., Rousselle, J., 2002. Performance of neural
Annual Report. Water Year 1980. Tahoe Research Group, Institute of Ecology.
networks in daily streamflow forecasting. Journal of Hydrologic Engineering 7
University of California, Davis. 82p.
(5), 392–398.
Liaw, C., Islam, M.N., Phoon, K.K., Liong, S., 2001. Comment on ‘‘Does the river run
Bogan, T., Othmer, J., Mohseni, O., Stefan, H., 2006. Estimating extreme stream
wild? Assessing chaos in hydrological systems” by G.B. Psternack. Advances in
temperatures by the standard deviate method. Journal of Hydrology 317, 173–
Water Resources 24, 575–580.
189.
Lowney, C.L., 2000. Stream temperature variation in regulated rivers: evidence for a
Bowden, G.J., Maier, H.R., Dandy, G.C., 2002. Optimal division of data for neural
spatial pattern in daily minimum and maximum magnitudes. Water Resources
networks models in water resources applications. Water Resources Research 38
Research 36 (10), 2947–2955.
(2). doi:10.1029/2001WR000266.
342 G.B. Sahoo et al. / Journal of Hydrology 378 (2009) 325–342
Mackey, A.P., Berrie, A.D., 1991. The prediction of water temperatures in chalk Roberts, D.M., Reuter, J.E., 2007. Lake Tahoe Total Maximum Daily Load, Technical
streams from air temperatures. Hydrobiologia 210, 183–189. Report CA–NV, California Regional Water Quality Control Board, Lahontan
Maier, H.R., Dandy, G.C., 1996. The use of artificial neural networks for the Region, CA, USA.
prediction of water quality parameters. Water Resources Research 32, 1013– Rowe, T.G., Saleh, D.K., Watkins, S.A., Kratzer, C.R., 2002. Streamflow and Water
1022. Quality Data for Selected Watersheds in the Lake Tahoe Basin, California and
Maier, H.R., Dandy, G.C., 1997. Author’s reply to comments by Fortin, V., Ouarda, Nevada, through September 1998. US Geological Survey Water Resources
T.B.M.J. and Bobee, B. on ‘‘The use of artificial neural networks for the prediction Investigations Report 02–4030, Carson City, NV. 117p.
of water quality parameters” by Maier, H.R., Dandy, G.C. Water Resources Sahoo, G.B., Ray, C., 2006a. Flow forecasting for a Hawaiian stream using rating
Research 33 (10), 2425–2427. curves and neural networks. Journal of hydrology 317, 63–80.
Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of Sahoo, G.B., Ray, C., 2006b. Predicting flux decline in cross-flow membranes using
water resources variables: a review of modelling issues and applications. artificial neural networks and genetic algorithms. Journal of Membrane Science
Environmental Modelling and Software 15 (1), 101–124. 283, 147–157.
Marquardt, D., 1963. An algorithm for least squares estimation of non-linear Sahoo, G.B., Ray, C., 2007. Reply to comments made by W. Sha on ‘‘Flow forecasting
parameters. Journal of the Society for Industrial and Applied Mathematics 11 for a Hawaii stream using rating curves and neural networks”. Journal of
(2), 431–441. Hydrology 340, 122–127. doi:10.1016/j.jhydrol.2007.04.004.
Massei, N., Dupont, J.P., Mahler, B.J., Laignel, B., Fournier, M., Valdes, D., Ogier, S., Sahoo, G.B., Ray, C., 2008a. In: Andreassen, M. Henrik (Ed.), Flow Forecasting Using
2006. Investigating transport properties and turbidity dynamics of a karst Artificial Neural Network and a Distributed Hydrological Model, MIKE SHE. New
aquifer using correlation, spectral, and wavelet analysis. Journal of Hydrology Topics in Water Resources Research and Management. NOVA Publishers, New
329, 244–257. doi:10:10.1016/j.jhydrol.200602.021. York, pp. 315–333.
Mohseni, O., Stefan, H.G., 1999. Stream temperature/air temperature relationship: a Sahoo, G.B., Ray, C., 2008b. Micro-genetic algorithms and artificial neural networks
physical interpretation. Journal of Hydrology 218, 128–141. to assess minimum data requirements for prediction of pesticide
O’Driscoll, M.A., DeWalle, D.R., 2006. Stream–air temperature relations to classify concentrations in shallow groundwater on a regional scale. Water Resources
stream–ground water interactions in a karst setting, central Pennsylvania, USA. Research 44, W05414. doi:10.1029/2007WR005875.
Journal of Hydrology 329, 140–153. Sahoo, G.B., Schladow, S.G., Reuter, J.E., 2007a. Linkage of Pollutant Loading to In-
Ozaki, N., Fukushima, T., Harasawa, H., Toshiharu, K., Kawashima, K., Ono, M., 2003. lake Effects. University of California Davis Tahoe Environmental Research
Statistical analyses on the effects of air temperature fluctuations on river water Center, Davis, CA, p. 62.
qualities. Hydrological Processes 17, 2837–2853. Sahoo, G.B., Schladow, S.G., Reuter, J.E., 2007b. Response of water clarity in Lake
Pasternack, G.B., 1999. Does the river run wild? Assessing chaos in hydrological Tahoe (CA–NV) to watershed and atmospheric load. In: Proceedings of the Fifth
systems. Advances in Water Resources 23, 253–260. International Symposium on Environmental Hydraulics, Tempe, Arizona.
Perez-Losada, J., 2001. A Deterministic Model for Lake Clarity: Application to Samani, N., Gohari-Moghadam, M., Safavi, A.A., 2007. A simple neural network
Management of Lake Tahoe, California–Nevada. Dissertation, UC Davis, CA, USA. model for the determination of aquifer parameters. Journal of Hydrology 340,
Poole, G.C., Berman, C.H., 2001. An ecological perspective on in-stream 1–11.
temperature: Natural heat dynamics and mechanisms of human-caused Sivakumar, B., 2002. A phase-space reconstruction approach to prediction of
thermal degradation. Environmental Management 27 (6), 787–802. suspended sediment concentration in rivers. Journal of Hydrology 258, 149–162.
Principe, J.C., Euliano, N.R., Lefebvre, W.C., 1999. Neural and Adaptive Systems: Takens, F., 1981. Detecting strange attractors in turbulence. In: Rand, D.A., Young,
Fundamentals through Simulations. John Wiley & Sons, New York. L.S. (Eds.), Dynamic Systems and Turbulence, Lecture Notes in Mathematics,
Ramírez, M.C.V., Velho, H.F.C., Ferreira, N.J., 2005. Artificial neural network 898. Springer, Berlin, pp. 366–381.
technique for rainfall forecasting applied to the São Paulo region. Journal of The MathWorks Inc., 2005. MATLAB Version 7.1, 3 Apple Hill Drive, Natick,
Hydrology 301, 146–162. Massachusetts, USA.
Ray, C., Klindworth, K.K., 2000. Neural networks for agrichemical vulnerability Wang, W., Pieter, H.A.J.M., Gelder, V., Vrijling, J.K., Ma, J., 2006. Forecasting daily
assessment of rural private wells. Journal of Hydrologic Engineering 5 (2), 162– streamflow using hybrid ANN models. Journal of Hydrology 324, 383–399.
171. doi:10.1016/j.jhydrol.2005.09.032.