Академический Документы
Профессиональный Документы
Культура Документы
net/publication/246404185
CITATIONS READS
608 2,087
2 authors:
Some of the authors of this publication are also working on these related projects:
Climate change and water resources adaptation: Decision scaling and integrated eco-engineering resilience View project
All content following this page was uploaded by Robert Wilby on 05 November 2014.
Published by:
http://www.sagepublications.com
Additional services and information for Progress in Physical Geography can be found at:
Subscriptions: http://ppg.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Abstract: This review considers the application of artificial neural networks (ANNs) to
rainfall–runoff modelling and flood forecasting. This is an emerging field of research, character-
ized by a wide variety of techniques, a diversity of geographical contexts, a general absence of
intermodel comparisons, and inconsistent reporting of model skill. This article begins by
outlining the basic principles of ANN modelling, common network architectures and training
algorithms. The discussion then addresses related themes of the division and preprocessing of
data for model calibration/validation; data standardization techniques; and methods of
evaluating ANN model performance. A literature survey underlines the need for clear guidance
in current modelling practice, as well as the comparison of ANN methods with more conven-
tional statistical models. Accordingly, a template is proposed in order to assist the construction
of future ANN rainfall–runoff models. Finally, it is suggested that research might focus on the
extraction of hydrological ‘rules’ from ANN weights, and on the development of standard
performance measures that penalize unnecessary model complexity.
Key words: artificial neural networks, flood forecasting, hydrology, model, rainfall–runoff.
I Introduction
The rainfall-runoff process has been described quantitatively since the nineteenth
century. However, it is only in the last decade or so that ANNs have been applied to the
problem. None the less, ANNs have been in existence since McCulloch and Pitts (1943)
introduced the concept of the artificial neuron. Since that time neural network research
has evolved in three distinct phases (Schalkoff, 1997). The first era involved preliminary
work on the development of the artificial neuron until Minsky and Papert (1969)
identified several limiting factors. The second era began with the rediscovery and pop-
ularization of the backpropagation training algorithm (Rumelhart and McClelland,
1986). Prior to this seminal work it was very difficult to train neural networks of any
practical size. The third era is characterized by more rigorous assessments of network
limitations and generalizations, fusion with other technologies (such as genetic
algorithms and fuzzy logic) (e.g., See and Openshaw, 1999) and the implementation of
ANNs using dedicated hardware.
The following sections provide an overview of ANNs, including the main structures,
network types and training algorithms. In recognition of the unfamiliar terminology
employed, a glossary of ANN terms is provided in the Appendix.
2 Network architectures
ANNs may be described as a network of interconnected neurons (sometimes called
nodes). Figure 1 presents the structure of an individual neuron. Each neuron consists
of a number of input arcs (stemming from other neurons or from outside the network;
u1 . . un) and a number of output arcs (which in turn lead to other neurons or to the
‘outside world’). A neuron computes an output, based on the weighted sum of all its
inputs (Sj), according to an activation function (f(Sj)). These activation functions may be
logistic sigmoid (see Figure 1), linear, threshold, Gaussian or hyperbolic tangent
functions, depending on the type of network and training algorithm employed. In the
majority of studies the logistic sigmoid function or hyperbolic tangent functions are
used. The logistic sigmoid activation function (Equation 1) – in which x represents the
weighted sum of inputs to the neuron and f(x) the neuron’s output – is often used
because it is continuous and relatively easy to compute (as is its derivative). It maps the
outputs away from extremes, and it introduces nonlinear behaviour to the network:
1
f(x) = (1)
1 + e–x
In feed-forward networks the connections between neurons flow in one direction: from
an input layer, through one or more hidden layers, to an output layer (see Figure 2).
While some studies direct predicted output back to the input side of a network to make
further predictions (e.g., Cheng and Noguchi, 1996), strictly speaking they are still feed
forward as only one forward pass is made through the network for each prediction.
Two feed-forward network types have been widely used to model the rainfall–runoff
process: the multilayer perceptron (MLP) and the radial basis function (RBF). These
networks typically consist of three or four connected layers of neurons (as shown in
Figure 2). The number of neurons in the input and output layer is specified by the
problem to which the network is applied (i.e., the number of predictors and
predictands, respectively). The neurohydrologist must specify the number of hidden
layers and neurons in each hidden layer. If there are too few neurons in the hidden
layers, the network may be unable to describe the underlying function because it has
insufficient parameters (or ‘degrees of freedom’) to map all points in the training data.
Conversely, if there are too many neurons, the network has too many free parameters
and may overfit the data, losing the ability to generalize. In addition, an ‘excessive’
number of hidden neurons can retard the training process to such an extent that it takes
an inordinate length of time for a network to learn.
It is possible to determine an ‘optimum’ number of neurons in the hidden layer(s)
during training by pruning out extraneous hidden nodes from a complex network.
Pruning algorithms, such as skeletonization and magnitude-based pruning (which
removes unwanted links rather than unwanted nodes), were evaluated by Abrahart et
al. (1998). An alternative approach is to add links and hidden nodes to a simple network
until convergence occurs – for example, using cascade correlation (Kwok and Yeung,
1997). Hirose et al. (1991) introduced a technique that combined these two ideas by pro-
gressively adding or removing nodes from a network during training until an optimum
structure is found. However, pruning and constructive algorithms can retard training
by introducing additional computations. Shamseldin (1997) claims that the best way to
determine an appropriate number of neurons in the hidden layer is via trial and error
and this remains one of the most popular solutions. For a more thorough discussion of
‘optimum’ network geometries, see Huang and Huang (1991) or Maier and Dandy
(1998).
Inputs to the network (predictors) are passed from the input layer of neurons,
through the hidden layer(s) of neurons, to the output layer (see Figure 2) where they
become predictands. Neurons in the input layer do no more than disperse all predictors
to each neuron in the hidden layer. The network operates by applying weights to values
as they pass from one layer to the next and calculating outputs for each of the neurons
in all other layers.
3 Training
A neural network is trained by adjusting the weights that link its neurons. This is
accomplished by presenting the network with a number of training samples (a
calibration data set), each one of which consists of a specific input pattern and corre-
sponding ‘correct’ output response. Depending on the nature of the training algorithm
used, it may be necessary to present the network with the calibration data repeatedly (a
number of epochs) until the underlying function is ‘learned’. However, care must be
taken to ensure that the network does not become overfamiliarized with the calibration
data and thus lose its ability to generalize to problems it has not yet encountered.
Various techniques may be employed to avoid over training, including regularization
theory, which attempts to smooth network mappings (Bishop, 1995), and cross-
validation using an independent test set (Braddock et al., 1998).
Earlier discussion identified one of the main strengths of neural networks, namely, their
ability to handle incomplete, noisy and nonstationary data (Zealand et al., 1999).
However, with suitable data preparation beforehand, it is possible to improve the
performance of a neural network still further (Masters, 1995). Data preparation involves
a number of processes such as data ‘cleansing’, determining appropriate predictors
(using data reduction techniques), standardizing/normalizing the data and, finally,
dividing the data into calibration and test sets.
1 Data ‘cleansing’
Data cleansing involves identifying and removing trends and nonstationary
components (in terms of the mean and variance) within a data set. Cycles and seasonal
fluctuations should also be identified and removed. For example, trends can be
removed by differencing the time series, and the data can be centred using rescaling
techniques. It is also possible to filter the data to extract underlying, important sources
of information and suppress troublesome noise (Masters, 1995). To date, data cleansing
techniques have not been widely applied in ANN rainfall–runoff modelling, so there is
much scope for development in this area (Maier and Dandy, 1996b).
3 Standardization/normalization
All variables should be standardized to ensure they receive equal attention during the
training process (Maier and Dandy, 2000). This is particularly important in RBF
networks where cluster centres would be dominated by high-magnitude input
variables. Without standardization in MLPs, input variables measured on different
scales will dominate training to a greater or lesser extent because initial weights within
a network are randomized to the same finite range.
Data standardization is also important for the efficiency of training algorithms. For
example, the gradient descent algorithm (error backpropagation) used to train the MLP
is particularly sensitive to the scale of data used. Due to the nature of this algorithm,
large values slow training because the gradient of the sigmoid function at extreme
values approximates zero (see Figure 1). To avoid this problem, data are rescaled using
an appropriate transformation. In general, data are rescaled to the intervals [–1, 1], [0.1,
0.9] or [0, 1] (referred to as standardization). Another approach is to rescale values to a
Gaussian function with a mean of 0 and unit standard deviation (referred to as normal-
ization). The advantage of using [0.1, 0.9] for runoff modelling is that extreme (high and
low) flow events, occurring outside the range of the calibration data, may be accom-
modated (Hsu et al., 1995). Alternatively, changes in flow rather than absolute flows
may be used to avoid the problem of saturation, but Minns and Hall (1997) reported
only limited gains from this approach. Other authors advocate [0.1, 0.85] (e.g.,
Shamseldin, 1997), or [–0.9, 0.9] (e.g., Braddock et al., 1998).
IV Model assessment
There is a general lack of objectivity and consistency in the way in which rainfall–runoff
models are assessed or compared (Legates and McCabe, 1999). This also applies to the
more specific case of ANN model assessment and arises for several reasons. First, there
are no standard error measures (although some have been more widely applied than
others). Secondly, the diversity of catchments studied (in terms of area, topography,
land use, climate regime, etc.) hinders direct comparisons. Thirdly, different aspects of
flow may be modelled (e.g., discharge, stage, rates of change of discharge, etc.). Finally,
there are broad differences between studies with respect to lead times (ranging from 0
to +24 model time steps) and the temporal granularity of forecasts (from seconds to
months).
When artificial neural networks are trained using algorithms such as backpropaga-
tion they are generally optimized in such a way as to minimize their global error. While
this is a useful general target, it does not necessarily lead to a network that is proficient
for both low flow and flood forecasting. The squared error, which is used in many
training algorithms, does provide a general measure of model performance, but it does
not identify specific regions where a model is deficient. Other error measures are,
therefore, employed to quantify these deficiencies (see the review of Watts, 1997).
The most commonly employed error measures are: the mean squared error (MSE),
the mean squared relative error (MSRE), the coefficient of efficiency (CE), and the
coefficient of determination (r2) (see Equations 3, 4, 5, 6 respectively);
n
∑ (Q – Q̂ ) i i
2
i=1 (3)
MSE =
n
n (Qi – Q̂ i)2
∑ Qi2
i=1 (4)
MSRE =
n
n
∑ (Q – Q̂ ) i i
2
i=1 (5)
CE = 1 – n
∑ –
(Qi – Q )2
i=1
2
n
∑ (Q – Q–)(Q̂ – Q~)
i i
i=1
r2 = n n (6)
∑ (Q – Q–) ∑ (Q̂ – Q~)
i
2
i
2
i=1 i=1
–
where Q̂ i are the n modelled flows, Qi are the n observed flows, Q is the mean of the
~
observed flows, and Q is the mean of the modelled flows.
According to Karunanithi et al. (1994), squared errors (MSE) provide a good measure
of the goodness of fit at high flows, whilst relative errors (MSRE) provide a more
balanced perspective of the goodness of fit at moderate flows. However, these measures
are strongly affected by catchment characteristics and care must be taken when
comparing studies using these statistics.
CE and r2, on the other hand, provide useful comparisons between studies since they
are independent of the scale of data used (i.e., flow, catchment, temporal granularity,
etc.). They are correlation measures that measure the ‘goodness of fit’ of modelled data
with respect to observed data. CE is referred to by some authors as the determination
coefficient (e.g., Cheng and Noguchi, 1996), the efficiency index, E (Abrahart and Kneale,
1997; Sureerattanan and Phien, 1997), F index (Minns and Hall, 1996), and R2 (e.g., Nash
and Sutcliffe, 1970). Care must be taken not to confuse R2 with the coefficient of deter-
mination, r2, which some authors also refer to as R2 (e.g., Lorrai and Sechi, 1995;
Furundzic, 1998; Legates and McCabe, 1999).
The CE statistic provides a measure of the ability of a model to predict flows which
are different from the mean (i.e., the proportion of the initial variance accounted for by
the model; Nash and Sutcliffe, 1970), and r2 measures the variability of observed flow
that is explained by the model (see the evaluation of Legates and McCabe, 1999). CE
ranges from –∞ at the worst case to +1 for a perfect correlation; r2 ranges from –1
(perfect negative correlation), through 0 (no correlation), to +1 (perfect positive
correlation). According to Shamseldin (1997) a CE of 0.9 and above is very satisfactory,
0.8 to 0.9 represents a fairly good model, and below 0.8 is deemed unsatisfactory.
Legates and McCabe (1999) highlight a number of deficiencies with relative measures
such as CE and r2. They note that r2 is particularly sensitive to outliers and insensitive
to additive and proportional differences between modelled and observed data. For
example, a model could grossly, but consistently, overestimate the observed data values
and still return an acceptable r2 statistic. Although CE is an improvement over r2 (in that
it is more sensitive to differences in modelled and observed means and variances) it is
still sensitive to extreme values. The index of agreement measure, d (Equation 7) has been
proposed as a possible alternative (Legates and McCabe, 1999) but it is still sensitive to
extreme values, owing to the use of squared differences. Modified versions of d and CE
have also been described which are both baseline adjusted (adjusted to the time series
against which the model is compared) and adapted from squared to absolute
differences. The second adaptation reduces the sensitivity of these measures to outliers.
The interested reader is directed towards Legates and McCabe (1999) for a more
thorough discussion.
n
∑ (Q – Q̂ ) i i
2
i=1
d=1– (7)
n
∑ (|Q̂ – Q–|+|Q – Q–|)
i i
2
i=1
Another error measure that has been used is S4E (presented as MS4E in Equation 8) by
Abrahart and See (1998). This higher-order measure places more emphasis on peak
flows than the lower-order MSE. Alternatively, the mean absolute error (MAE, Equation
9), which computes all deviations from the original data regardless of sign, is not
weighted towards high flow events:
n
∑ (Q – Q̂ ) i i
4
i=1 (8)
MS4E =
n
n
∑ |Q – Q̂ | i i
i=1 (9)
MAE =
n
Other measures that have been employed in only a limited number of cases include
RMSE/µ (RMSE as percentage of observed mean; Jayawardena et al., 1997; Fernando
and Jayawardena, 1998); %MF, the percent error in modelled maximum flow relative to
observed data (Hsu et al., 1995; Furundzic, 1998); %VE, the percent error in modelled
runoff volume (Hsu et al., 1995); and %NRMSE, the percentage of values exceeding the
RMSE (Campolo et al., 1999). An RMS normalized error was used by Atiya et al. (1996)
and is defined as the square root of the sum squared errors divided by the square root
of the sum squared desired outputs.
Lachtermacher and Fuller (1994) identify other measures for time series analysis such
as the average relative variance (Nowlan and Hinton, 1992) and mean error (Gorr et al.,
1992). Another measure often used in time series analysis is Theil’s U-statistic (Theil,
1966), which provides a relative basis for comparing complex and naive models.
However, these measures have yet to be used in the evaluation of ANN rainfall–runoff
models.
Classification approaches are also used to evaluate predictive models. For example,
Colman and Davy (1999) used a classification technique to evaluate seasonal weather
forecasts. In this technique the observed data were assigned to one of three equiproba-
ble sets, or terces (in this case, below-average, average and above-average tempera-
tures). Model skill (relative to chance) is then assessed using a chi-square test of the
modelled versus expected frequencies in each category. Similarly, Abrahart and See
(1998) classified predictions according to % correct; % under predictions within ±5, 10,
25% of observed; and % predictions greater than ±25% of observed. This allows direct
comparisons to be made between different models irrespective of the predictand and
model time step.
While the above discussion relates more generally to rainfall–runoff modelling, flood
forecasting systems need to employ additional error measures. For example, P–P
(Dawson et al., 2000) is a measure of the error in the timing of a predicted flood
peak (Chang and Hwang, 1999, refer to this as ETp). Abrahart and See (1998) use
MAEpp and RMSEpp which measure equivalent values to MAE and RMSE for all flood
events in a data set. They also employed a classification criteria which measures
% early, % late and % correct occurrences of individual predicted peaks (although
they do not indicate what discrepancy constitutes a ‘late’ peak). A further measure
used for flood forecasting is total volume but this measure provides no indication of
95
Burcot Tower, Burcot, Abingdon, Oxfordshire, OX14 3DJ, UK, 29 November 2000.
96 Hydrological modelling using artificial neural networks
neuron activation function, while the most popular second choice was the hyperbolic
tangent function (13%). Of the articles presented in Table 2, only five used the
alternative RBF network and just one (Chang and Hwang, 1999) used a GMDH (group
method of data handling) network structure. Although some authors claim to use
recurrent networks, all networks reviewed were in fact feed forward (see section II).
Network architectures were generally optimized using a trial and error approach
(51%). Some studies select the network architecture based on experience from earlier
work; others use optimization algorithms such as cascade correlation, genetic
algorithms, magnitude pruning and skeletonization (e.g., Karunanithi et al., 1994;
Abrahart et al., 1998). When network architectures were configured, the majority of
studies used one hidden layer (70%). Others experimented with two hidden layers and
Sajikumar and Thandaveswara (1999) experimented with three. In the majority of cases
(68%) training was performed using standard error backpropagation. However,
relatively few articles discussed how this was implemented (32%) or included
information on the learning or momentum parameters. In those studies that
implemented RBF networks, four of the five used a K-means clustering algorithm to
determine the basis function centres, while the remaining article (Fernando and
Jayawardena, 1998) used an orthogonal least squares technique. Other variations to
training involve improvements to the standard backpropagation algorithm (for
example, using conjugate gradients), or alternative techniques such as quick
propagation, linear least squares simplex (Hsu et al., 1995) and a temporal backpro-
pogation algorithm (Sajikumar and Thandaveswara, 1999). Discussion of the number of
epochs performed during training and of the stopping criteria was largely absent.
Those articles that reported this information generally specified the number of training
cycles beforehand.
The literature survey also revealed a notable lack of contributions in which different
ANN configurations were compared or, perhaps more critically, assessed relative to
more conventional statistical approaches. For example, Dawson and Wilby (1999)
compared the cross-validation results of two ANN models (the MLP and RBF) with a
stepwise multiple linear regression model (SWMLR) and zero-order forecasts (ZOF) of
river flow, given 15-minute rainfall–runoff data for the River Mole (a flood-prone
tributary of the River Thames, UK). Using only antecedent rainfall and discharge mea-
surements, the four models were used to forecast river flows with 6-hour lead time and
15-minute resolution. Figure 4 compares the observed versus forecasted flows for
winter/early spring 1994, and Figure 5 shows the corresponding hydrographs for a two
week subset of the same data. Overall, the MLP was more skilful than the RBF, SWMLR
and ZOF models. However, according to performance measures such as the RMSE,
MSRE, CE and r2, the RBF flow forecasts were only marginally better than those of the
simpler SWMLR and ZOF models. This result suggest that ANNs should be regarded
as an alternative to more traditional rainfall–runoff methods rather than a replacement
(Maier and Dandy, 2000). Clearly, many more studies of this type are required before
the optimal model configurations and/or circumstances for ANN flow-forecasting can
be established firmly.
97
Burcot Tower, Burcot, Abingdon, Oxfordshire, OX14 3DJ, UK, 29 November 2000.
98
Hydrological modelling using artificial neural networks
Table 2 Details of studies reviewed
Reference Time step Lead steps Variable Location Catchment Hidden ANN/training
area (km2) layers
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
1 Abrahart and Kneale (1997) Hour [0, +1, +12] Flow Wye, Wales 10.55 4 MLP/BP
2 Abrahart and See (1998) Hour +1 Stage, change Wye, Ouse, UK 10.55, 3286 3, 4 MLP/BP
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008
(1995)
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008
37 Sajikumar and Month +0 Stage Lee, UK, Thuthapuzha, 1419, 1030 3, 4, 5 MLP/TBP
Thandaveswara (1999) India
38 See et al. (1997) Hour +6,. +12 Stage Ouse, England 3286 4 MLP/BP
39 See et al. (1998) Hour +1, +6 Stage, change Ouse, Wye, UK 3286, 10.55 –– ––
40 See and Abrahart (1999) Hour +1, +6 Stage, change Ouse, Wye, UK 3286, 10.55 3, 4 MLP/BP
41 See and Openshaw (1998; Hour +6 Stage Ouse, England 3286 3 MLP/CG
1999)
42 Shamseldin (1997) Day +0 Runoff 11 various, worldwide 18000 3 MLP/CG
43 Smith and Eli (1995) ? N/A Peak discharge, FC Synthetic grid N/A 3 MLP/BP
44 Stüber and Gemmer (1997) Hour +6 Stage Mosel, Germany ? 4 MLP/BP
45 Sureerattanan and Phien Day +1 Discharge Mae Klong, Thailand 10880 3 MLP/BP
(1997)
46 Tawfik et al. (1997) Day +0 Discharge White Nile, Egypt ? 3 MLP/BP
47 Thirumalaiah and Deo Hour +1, +2, +3 Runoff Bhasta and Chorna, India 390.86 3 MLP/BP, CG, CC
(1998a)
48 Thirumalaiah and Deo Day +1, +2 Stage Indravathi, India 41700 3 MLP/BP, CG, CC
(1998b)
49 Tokar and Johnson (1999) Day +0 Discharge Little Patuxent, USA 19270 3 MLP/BP
Note: Those values/terms underlined in the table represent the most accurate model configuration in the study. Those items in italics have been inferred. N/A is not
applicable and ‘?’ indicates that the information is unknown. The time step is the temporal granularity used. The number of lead steps is calculated from the most
recent predictor. For example, a model using antecedent flow as a predictor, but also current rainfall, would be recorded as ‘0’ as rainfall is the most recent
predictor. [+x, +y] represents all lead steps between times x and y inclusive.
Abbreviations: BP = backpropagation; CC = cascade correlation; CG = conjugate gradient; Change = rate of change of the predictand; FC = Fourier coefficients;
GMDH = group method of data handling; KM = K-means clustering algorithm; LLSSIM = linear least squares simplex; MA = moving average; MLP = multilayer
perceptron; NFU = normalized flow units; OLS = orthogonal least squares; QP = quickpropagation; RBF = radial basis function network; TBP = temporal
backpropagation.
99
100 Hydrological modelling using artificial neural networks
Of all the studies evaluated within this survey one factor, above all others, is crucial to
the implementation of an ANN rainfall–runoff model: the availability of suitable, high-
quality data (Smith and Eli, 1995; Tokar and Johnson, 1999). From this point on the
implementation of an effective model is largely dependent on the skill and experience
of the neurohydrologist. To conclude, we propose a template for ANN model
development, and then suggest several areas for future research.
Step 1 Gather data: Ensure sufficient data are available for a meaningful study in
terms of both quantity and quality (i.e., information content is paramount) (Tokar and
Johnson, 1999).
and RBF neural networks are an appropriate starting point. Begin with an MLP
(trained using standard backpropagation) as this provides a benchmark with
which to evaluate any other models.
Choose appropriate activation function(s) for the neurons. For MLPs generally
use either the logistic sigmoid or hyperbolic tangent functions as a starting point.
For RBFs, Gaussian basis functions are most commonly used.
4.2 Training algorithm: Select a suitable training algorithm to modify weights and
biases, and determine network architecture. Choose appropriate values for
learning parameters (momentum and learning rate) within the range 0.01–0.9.
Step 7 Evaluation: Select error measures (see section IV) that are appropriate to the
model output and purpose. Compare results with those derived from alternative model
configurations.
2 Future directions
From the preceding discussion it is evident that ANN construction involves many
arbitrary decisions, with hitherto little guidance as to the best code of practice or choice
of standard error measures. (In passing, it is conceded that the same criticisms might
also be levelled at the wider discipline of hydrological modelling.) There is also an
urgent need for more inter-model comparisons and rigorous assessment of ANN
solutions versus traditional hydrological methods. Other common failures of existing
ANN modelling practice include: the widespread usage of validation data during the
training process; the arbitrary choice of model inputs, network structures and internal
model parameters; and inadequate preprocessing of model inputs (Maier and Dandy,
2000).
Despite these limitations, there is little doubt (after less than 10 years application) that
ANNs are well suited to the challenging tasks of rainfall–runoff and flood forecasting.
However, future advances in the field will be contingent upon the refinement of
objective guidelines for ANN construction and the development/use of standard
measures of ANN model skill. In this respect, measures of accuracy which penalize
unnecessary model complexity, would greatly enhance model intercomparisons.
Furthermore, indices of catchment properties (such as the mean lag-response between
rainfall and runoff) would enable the comparison of results for different catchments by
acknowledging that a component of model skill is directly attributable to basin
properties (e.g., underlying geology, land use, relief, etc.). Finally, there is considerable
scope for the extraction of hydrological ‘rules’ from the connection weights of trained
ANN models using sensitivity analyses or rule extraction algorithms (e.g., French et al.,
1992; Andrews et al., 1995; Maier et al., 1998; Abrahart et al., 1999). In this way, ANNs
Acknowledgements
We thank the anonymous reviewer for constructive comments on our original
manuscript. RW was supported by ACACIA (A Consortium for the Application of
Climate Impact Assessments).
References
Abrahart, R.J. and Kneale, P.E. 1997: Exploring neural networks. In Proceedings, World Congress
neural network rainfall–runoff modelling. In on Neural Networks, San Diego, CA, September,
Proceedings of the 6th British Hydrological Society 461–64.
symposium, Salford University, 9.35–9.44. –– 1999: A comparison between neural-network
Abrahart, R.J. and See, L. 1998: Neural network forecasting techniques – case study: river flow
vs. ARMA modelling: constructing benchmark forecasting. IEEE Transactions on Neural
case studies of river flow prediction. In Networks, 10, 402–409.
Proceedings of the 3rd International Conference on
Geocomputation, University of Bristol, 17–19 Battiti, R. 1992: First- and second-order methods
September (http://www.geog.port.ac.uk/ for learning: between steepest descent and
geocomp/geo98/05/gc_05.htm) (23 February Newton’s method. Neural Computation 4,
2000). 141–66.
Abrahart, R.J. See, L. and Kneale, P.E. (1998) Bishop, C.M. 1995: Neural networks for pattern
New tools for neurohydrologists: using recognition. Oxford: Clarendon Press.
network pruning and model breeding Braddock, R.D. Kremmer, M.L. and Sanzogni, L.
algorithms to discover optimum inputs and 1998: Feed-forward artificial neural network
architectures. In Proceedings of the 3rd Inter- model for forecasting rainfall run-off.
national Conference on Geocomputation, University Environmetrics 419–32.
of Bristol, 17–19 September. (http://www.geog.
port.ac.uk/geocomp/geo98/20/gc_20.htm) Campolo, M. Andreussi, P. and Soldati, A.
(23 February 2000). 1999: River flood forecasting with a neural
–––– 1999: Applying saliency analysis to neural network model. Water Resources Research 35,
network rainfall-runoff modelling. Proceedings 1191–97.
of the 4th International Conference on Chang, F. and Hwang, Y. 1999: A self-organiza-
Geocomputation, Fredericksburg, Virginia, USA, tion algorithm for real-time flood forecast.
25–28 July (http://www.ashville.demon. Hydrological Processes 13, 123–38.
co.uk/geocomp). Cheng, X. and Noguchi, M. 1996: Rainfall–runoff
Abu-Mostafa, Y.S. 1989: The Vapnik– modelling by a neural network approach. In
Chervonenkis dimension: information versus Proceedings of the International Conference on
complexity in learning. Neural Computation, Water Resources and Environmental Research
312–17. 143–50.
Akaike, H. 1974: A new look at the statistical Clair, T.A. and Ehrman, J.M. 1996: Variations in
model identification. IEEE Transactions on discharge and dissolved organic carbon and
Automotive Control. AC-19, 716–23. nitrogen export from terrestrial basins with
Anderson, M.G. and Burt, T.P., editors, 1985: changes in climate: a neural network approach.
Hydrological Forecasting. Chichester: Wiley. Limnology and Oceanography 41, 921–27.
Andrews, R. Diederich, J. and Tickle, A.B. 1995: Colman, A. and Davey, M. 1999: Prediction of
A survey and critique of techniques for summer temperature, rainfall and pressure in
extracting rules from trained artificial neural Europe from preceding winter north Atlantic
networks. Knowledge Based Systems, 8, 373–89. Ocean temperature. International Journal of
Atiya, A. El-Shoura, S., Shaheen, S. and El- Climatology 19, 513–36.
Sherif, M. 1996: River flow forecasting using Crespo, J.L. and Mora, E. 1993: Drought
estimation with neural networks. Advances in editors, 2000: Artificial neural networks in
Engineering Software 18, 167–70. hydrology. Dordrecht: Kluwer Academic.
Dai, H. and MacBeth, C. 1997: Effects of learning Hall, M.J. and Minns, A.W. 1993: Rainfall–runoff
parameters on learning procedure and modelling as a problem in artificial intelligence:
performance of a BPNN. Neural Networks 10, experience with a neural network. In
1505–21. Proceedings of the 4th British Hydrological Society
Danh, N.T. Phien, H.N. and Gupta, A.D. 1999: symposium, Cardiff, 5.51–5.57.
Neural networks for river flow forecasting. Haykin, S. 1999: Neural networks: a comprehensive
Water SA 25, 33–39. foundation (2nd edn). London: Prentice Hall.
Dawson, C.W. 1996: A neural network approach Hewitson, B.C. and Crane, R.G. 1994: Neural nets:
to software project effort estimation. Applica- applications in geography. Dordrecht: Kluwer
tions of Artificial Intelligence in Engineering 1, Academic.
229–37. Hirose, Y., Yamashita, K. and Hijiya, S. 1991:
Dawson, C.W. Brown, M. and Wilby, R. 2000: Back-propagation algorithm which varies the
Inductive learning approaches to number of hidden units. Neural Networks 4,
rainfall–runoff modelling. International Journal 61–66.
of Neural Systems 10, 43–57. Hsu, K. Gupta, H.V. and Sorooshian, S. 1995:
Dawson, C.W. and Wilby, R. 1998: An artificial Artificial neural network modeling of the
neural network approach to rainfall–runoff rainfall–runoff process. Water Resources Research
modelling. Hydrological Sciences Journal 43, 31, 2517–30.
47–66. Huang, S. and Huang, Y. 1991: Bounds on the
–––– 1999: A comparison of artificial neural number of hidden neurons in multilayer
networks used for river flow forecasting. perceptrons. IEEE Transactions on Neural
Hydrology and Earth System Sciences 3, 529–40. Networks, 2, 47–55.
Fahlman, S.E. 1988: Faster-learning variations on Janacek, G. and Swift, L. 1993: Time series
back-propagation: an empirical study. In forecasting, simulation, applications. London: Ellis
Touretzky, D., Hinton, G.E. and Sejnowski, T.J., Horwood.
editors, Proceedings of the 1988 Connectionist Jayawardena, A.W. and Fernando, D.A.K. 1998:
Models Summer School. San Mateo, CA: Morgan Use of radial basis function type artificial
Kaufmann, 38–51. neural networks for runoff simulation.
Fernando, D.A.K. and Jayawardena, A.W. 1998: Computer-Aided Civil and Infrastructure
Runoff forecasting using RBF networks with Engineering 13, 91–99.
OLS algorithm. Journal of Hydrologic Engineering Jayawardena, A.W. Fernando, D.A.K. and Zhou,
3, 203–209. M.C. 1997: Comparison of multilayer
French, M.N., Krajewski, W.F. and Cuykendall, perceptron and radial basis function networks
R.R. 1992: Rainfall forecasting in space and as tools for flood forecasting. In Destructive
time using a neural network. Journal of water: water-caused natural disaster, their
Hydrology 137, 1–31. abatement and control (proceedings of the con-
Furundzic, D. 1998: Application of neural ference at Anaheim, CA, June). IAHS Publication
networks for time series analysis: 239, Wallingford: IAHS Press, 173–81.
rainfall–runoff modeling. Signal Processing 64,
383–96. Kang, K.W . Park, C.Y. and Kim, J.H. 1993:
Neural network and its application to
Gallant, S.I. 1993: Neural network learning and rainfall–runoff forecasting. Korean Journal of
expert system. London: MIT Press. Hydroscience, 4, 1–9.
Golob, R. Stokelj, T. and Grgic, D. 1998: Neural- Karunanithi, N. Grenney, W.J. Whitley, D. and
network-based water inflow forecasting. Bovee, K. 1994: Neural networks for river flow
Control Engineering Practice 6, 593–600. prediction. Journal of Computing in Civil
Gorr, W. Nagin, D. and Szcypula, J. 1992: The Engineering 8, 201–20.
relevance of artificial neural networks to managerial Kohohen, T. 1984: Self-organization and associative
forecasting; an analysis and empirical study. memory. New York: Springer-Verlag.
Technical Report 93-1. Pittsburgh, PA: Heinz –––– 1990: The self-organizing map. Proceedings of
School of Public Policy Management, Carnegie the IEEE 78, 1464–80.
Mellon University. Kwok, T.Y. and Yeung, D.Y. 1997: Constructive
Govindaraju, R.S. and Ramachandra Rao, A., algorithms for structure learning in
feedforward neural networks for regression Murray, South Australia. Ecological Modelling
problems. IEEE Transactions on Neural Networks 105, 257–72.
8, 630–45. Mason, J.C, Tem’me, A. and Price, R.K. 1996: A
neural network model of rainfall–runoff using
Lachtermacher, G. and Fuller, J.D. 1994: radial basis functions. Journal of Hydraulic
Backpropagation in hydrological time series Research 34, 537–48.
forecasting. In Hipel, K.W., McLeod, A.I., Panu, Masters, T. 1995: Neural, novel and hybrid
U.S. and Singh, V.P., editors, Stochastic and algorithms for time series prediction. New York:
statistical methods in hydrology and environmental Wiley.
engineering. Vol. 3, Dordrecht: Kluwer, 229–42. McCulloch, W.S. and Pitts, W. 1943: A logical
Lange, N.T. 1999: New mathematical approaches calculus of the ideas imminent in nervous
in hydrological modeling – an application of activity. Bulletin of Mathematical Biophysics 5,
artificial neural networks. Physics and Chemistry 115–33.
of the Earth 24, 31–35. Minns, A.W . 1996: Extended rainfall–runoff
Legates, D.R. and McCabe, G.J. 1999: Evaluating modelling using artificial neural networks’. In
the use of ‘goodness-of-fit’ measures in Muller, A., editor, Hydroinformatics 96:
hydrologic and hydroclimatic model proceedings of the 2nd International Conference on
validation. Water Resources Research 35, Hydroinformatics, Zurich, Vol. 1, 207–13.
233–41. Minns, A.W. and Hall, M.J. 1996: Artificial
Liong, S.Y. and Chan, W.T. 1993: Runoff volume neural networks as rainfall–runoff models.
estimates with neural networks. In Topping, Hydrological Sciences Journal 41, 399– 417.
B.H.V. and Khan, A.I., editors, Proceedings of the –––– 1997: Living with the ultimate black box:
3rd International Conference on the Application of more on artificial neural networks. In:
AI to Civil and Structural Engineering, Proceedings of the 6th British Hydrological Society
Edinburgh: Civil Computer Press, 67–70. Symposium, Salford University, 9.45–9.49.
Loke, E. Warnaars, E.A., Jacobsen, P. Nelen, F. Minsky, M.L. and Papert, S.A. 1969: Perceptrons.
and Almeida, M.D. 1997: Artificial neural Cambridge, MA: MIT Press.
networks as a tool in urban storm drainage. Murata, N. Yoshizawa, S. and Amari, S. 1994:
Water Science and Technology 36, 101–109. Network information criteria – determining the
Lorrai, M. and Sechi, G.M. 1995: Neural nets for number of hidden units for an artificial neural
modelling rainfall–runoff transformations. network model. IEEE Transactions on Neural
Water Resources Management 9, 299–313. Networks 5, 865–72.
Muttiah, R.S. Srinivasan, R. and Allen, P.M.
Magoulas, G.D., Vrahatis, M.N. and 1997: Prediction of two-year peak stream
Androulakis, G.S. 1997: Effective backpropa- discharges using neural networks. Journal of
gation training with variable stepsize. Neural the American Water Resources Association 33,
Networks 10, 69–82. 625–30.
Maier, H.R. and Dandy, G.C. 1996a: The use of
artificial neural networks for the prediction of Nash, J.E. and Sutcliffe, J.V. 1970: River flow
water quality parameters. Water Resources forecasting through conceptual models. Part 1.
Research 32, 1013–22. A discussion of principles. Journal of Hydrology
–––– 1996b: Neural network models for 10, 282–90.
forecasting multivariate time series. Neural Nowlan, S.J. and Hinton, G.E. 1992: Simplifying
Network World 6, 747–71. neural networks by soft weight-sharing. Neural
–––– 1998: The effect of internal parameters and Computation 4, 473–93.
geometry on the performance of back-
propagation neural networks: an empirical O’Loughlin, G. Huber, W. and Chocat, B. 1996:
study. Environmental Modelling and Software 13, Rainfall–runoff processes and modelling.
193–209. Journal of Hydraulic Research 34, 733–51.
–––– 2000: Neural networks for the prediction
and forecasting of water resources variables: a Pankiewicz, G.S. 1997: Neural network classifi-
review of modelling issues and applications. cation of convective air masses for a flood
Environmental Modelling and Software 15, forecasting system. International Journal of
101–23. Remote Sensing 18, 887–98.
Maier, H.R. Dandy, G.C. and Burch, M.D. 1998: Poff, N.L. Tokar, S. and Johnson, P. 1996: Stream
Use of artificial neural networks for modelling hydrological and ecological responses to
cyanobacteria Anabaena spp. in the River climate change assessed with an artificial
neural network. Limnology and Oceanography 41, Shamseldin, A.Y. 1997: Application of a neural
857–63. network technique to rainfall–runoff model-
ling. Journal of Hydrology 199, 272–94.
Raman, H. and Sunilkumar, N. 1995: Smith, J. and Eli, N. 1995: Neural-network
Multivariate modelling of water resources time models of rainfall–runoff process. Journal of
series using artificial neural networks. Water Resources Planning and Management 121,
Hydrological Sciences 40, 145–63. 499–508.
Refenes, A. Burgess, A.N. and Bents, Y. 1997: Song, X.M. 1996: Radial basis function networks
Neural networks in financial engineering: a for empirical modeling of chemical process.
study in methodology. IEEE Transaction on MSc thesis, University of Helsinki
Neural Networks 8, 1223–67. (http://www.cs.Helsinki.FI/~xianming) (28
Rissanen, J. 1978: Modeling by short data January 1999).
description. Automation 14, 465–71. Stüber, M. and Gemmer, P. 1997: An approach for
Rumelhart, D.E. and McClelland, J.L., editors, data analysis and forecasting with neuro fuzzy
1986: Parallel distributed processing: explorations systems – demonstrated on flood events at
in the microstructures of cognition. Vol. 1. River Mosel. Lecture Notes in Computer Science,
Cambridge, MA: MIT Press. Computational Intelligence 1226, 468–77.
Sureerattanan, S. and Phien, H.N. 1997: Back-
Sajikumar, N. and Thandaveswara, B.S. 1999: A propagation networks for daily streamflow
non-linear rainfall-runoff model using artificial forecasting. Water Resources Journal December,
neural networks. Journal of Hydrology 216, 1–7.
32–55.
Schalkoff, R.J. 1997: Artificial neural networks. Tawfik, M. Ibrahim, A. and Fahmy, H. 1997:
New York: McGraw-Hill. Hysteresis sensitivity neural network for
Schwarz, G. 1978: Estimating the dimension of a modeling rating curves. Journal of Computing in
model. Annals of Statistics 6, 461–64. Civil Engineering 11, 206–11.
See, L. and Abrahart, R.J. 1999: Multi-model data Theil, H. 1966: Applied economic forecasting.
fusion for hydrological forecasting. In Amsterdam: North-Holland.
Proceedings of the 4th International Conference Thirumalaiah, K. and Deo, M.C. 1998: Real-time
on Geocomputation, Fredericksburg, Virginia, flood forecasting using neural networks.
USA, 25–28 July (http://www.ashville. Computer-Aided Civil and Infrastructure
demon.co.uk/geocomp). Engineering 13, 101–11.
See, L., Abrahart, R.J. and Openshaw, S. 1998: –––– 1998b: River stage forecasting using
An integrated neuro-fuzzy statistical approach artificial neural networks. Journal of Hydrologic
to hydrological modelling. In Proceedings of the Engineering 3, 26–32.
3rd International Conference on Geocomputation, Todini, E. 1988: Rainfall–runoff modelling – past,
University of Bristol, 17–19 September (http:// present and future. Journal of Hydrology 100,
www.geog.port.ac.uk/geocomp/geo98/22/gc 341–52.
_22.htm) (23 February 2000). Tokar, S.A. and Johnson, P.A. 1999:
See, L. Corne, S., Dougherty, M. and Openshaw, Rainfall–runoff modeling using artificial neural
S. 1997: Some initial experiments with neural networks. Journal of Hydrologic Engineering 4,
network models of flood forecasting on the 232–39.
River Ouse. In Proceedings of the 2nd
International Conference on Geocomputation, Watts, G. 1997: Hydrological modelling. In Wilby,
26–29 August, University of Otago, Dunedin, New R.L., editor, Contemporary hydrology: towards
Zealand, 59–67 (http://www/ashville.demon. holistic environmental science, Chichester: Wiley,
co.uk/geocomp). 151–93.
See, L. and Openshaw, S. 1998: Using soft
computing techniques to enhance flood Yang, R. 1997: Application of neural networks
forecasting on the River Ouse. In Babovic, V. and genetic algorithms to modelling flood
and Larsen, L.C., editors, Hydroinformatics 98: discharges and urban water quality. Unpub-
proceedings of the Third International Conference lished PhD thesis, University of Manchester.
on Hydroinformatics, 24–26 August, Copenhagen,
Denmark, 819–24. Zealand, C.M. Burn, D.H. and Simonovic, S.P.
–––– 1999: Applying soft computing approaches 1999: Short term streamflow forecasting using
to river level forecasting. Hydrological Sciences artificial neural networks. Journal of Hydrology
Journal 44, 763–78. 214, 32–48.
Zhu, M. Fujita, M. and Hashimoto, N. 1994: U.S., and Singh, V.P., editors, Stochastic and
Application of neural networks to runoff statistical methods in hydrology and environmental
prediction. In Hipel, K.W., McLeod, A.I., Panu, engineering. Vol. 3, Dordrecht: Kluwer, 205–16.
activation function the function embedded within a neuron that transforms the weighted sum
of its inputs into an output. These functions may be logistic sigmoid, linear, threshold,
Gaussian, hyperbolic tangent, etc.
architecture the structure of an ANN – including the number and connectivity of neurons.
Usually an ANN is arranged into several layers of neurons – an input layer, one or more
hidden layers and an output layer.
backpropagation the training algorithm for the feed-forward, multilayer perceptron which
works by propagating errors back through a network and adjusting weights and biases to
reduce this error accordingly.
basis function a model function can be represented by a linear combination of several basis
functions. The term ‘basis function’ stems from basis vectors that combine to form a vector.
bias an additional weighted input to a neuron that stems from an imaginary unit that always
has a value of one. During calibration this bias input is adjusted in the same way as other
weights by the training algorithm.
constructive algorithm a training algorithm that successively adds neurons to the hidden layer
of an ANN in order to determine an optimum network geometry.
delta rule a term often used for backpropagation. The delta rule refers to the adjustments made
to weights and biases in an ANN during training.
epoch a single pass through the calibration data set during training.
error surface/function see weight space (below).
feed forward a network in which all the connections between neurons flow in one direction –
from an input layer, through one or more hidden layers, to an output layer.
geometry see architecture (above).
layer the arrangement of neurons, being hidden, input or output.
learning parameter used in the backpropagation algorithm to adjust the rate of changes to a
network’s weights during training.
MLP Multilayer perceptron. The popular feed-forward (three or four layer) ANN comprising
sigmoid or hyperbolic tangent neuron functions, trained using standard backpropagation.
momentum a factor used in the backpropagation training algorithm to speed convergence to
an error minimum.
neuron the basic building block of a neural network. A neuron sums the weighted inputs from
the ‘outside world’ or other neurons, then processes these inputs using an activation function,
and produces an output response.
node see neuron (above).
normalization rescaling data to a standard normal distribution. This is sometimes referred to
as standardization which involves rescaling data linearly to a particular range such as [0,1], [–1,
+1], etc.
pruning algorithm a training algorithm that progressively reduces the number of neurons in
an ANN’s hidden layer in an attempt to find an optimum network geometry.
RBF radial basis function. The ANN with the same structure as an MLP but the neurons are
represented by radial basis functions.
recurrent an ANN in which connections between neurons feed backwards through the
network as well as forwards.
standardization see normalization (above).
training/learning the process of adjusting a network’s weights and biases so that the ANN
model error is progressively reduced.
transfer function see activation function (above).
weight a multiplicative value applied to a neuron’s inputs (each input having a different
weight). It is the weights in a network that are adjusted to ‘train’ it.
weight space the n-dimensional surface in which weights in a network are adjusted by the
backpropagation algorithm to minimize model error.