Вы находитесь на странице: 1из 31

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/246404185

Hydrological Modelling Using Artificial Neural Networks

Article  in  Progress in Physical Geography · March 2001


DOI: 10.1177/030913330102500104

CITATIONS READS

608 2,087

2 authors:

C. W. Dawson Robert Wilby


Loughborough University Loughborough University
70 PUBLICATIONS   4,324 CITATIONS    266 PUBLICATIONS   19,885 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Climate change and water resources adaptation: Decision scaling and integrated eco-engineering resilience View project

STARDEX View project

All content following this page was uploaded by Robert Wilby on 05 November 2014.

The user has requested enhancement of the downloaded file.


Progress in Physical Geography
http://ppg.sagepub.com

Hydrological modelling using artificial neural networks


C. W. Dawson and R. L. Wilby
Progress in Physical Geography 2001; 25; 80
DOI: 10.1177/030913330102500104

The online version of this article can be found at:


http://ppg.sagepub.com/cgi/content/abstract/25/1/80

Published by:

http://www.sagepublications.com

Additional services and information for Progress in Physical Geography can be found at:

Email Alerts: http://ppg.sagepub.com/cgi/alerts

Subscriptions: http://ppg.sagepub.com/subscriptions

Reprints: http://www.sagepub.com/journalsReprints.nav

Permissions: http://www.sagepub.com/journalsPermissions.nav

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
Progress in Physical Geography 25,1 (2001) pp. 80–108

Hydrological modelling using


artificial neural networks
C.W. Dawson1 and R.L. Wilby2
1Department of Computer Science, Loughborough University, Loughborough,
Leicestershire LE11 3TU, UK
2Division of Geography, University of Derby, Kedleston Road, Derby DE22 1GB,

UK and National Center for Atmospheric Research, Boulder, CO 80307, USA

Abstract: This review considers the application of artificial neural networks (ANNs) to
rainfall–runoff modelling and flood forecasting. This is an emerging field of research, character-
ized by a wide variety of techniques, a diversity of geographical contexts, a general absence of
intermodel comparisons, and inconsistent reporting of model skill. This article begins by
outlining the basic principles of ANN modelling, common network architectures and training
algorithms. The discussion then addresses related themes of the division and preprocessing of
data for model calibration/validation; data standardization techniques; and methods of
evaluating ANN model performance. A literature survey underlines the need for clear guidance
in current modelling practice, as well as the comparison of ANN methods with more conven-
tional statistical models. Accordingly, a template is proposed in order to assist the construction
of future ANN rainfall–runoff models. Finally, it is suggested that research might focus on the
extraction of hydrological ‘rules’ from ANN weights, and on the development of standard
performance measures that penalize unnecessary model complexity.

Key words: artificial neural networks, flood forecasting, hydrology, model, rainfall–runoff.

I Introduction

Rainfall–runoff models are conventionally assigned to one of three broad categories:


deterministic (physical), conceptual or parametric (also known as analytic or empirical)
(Anderson and Burt, 1985; Watts, 1997). Deterministic models describe the
rainfall–runoff process using physical laws of mass and energy transfer. Conceptual
models provide simplified representations of key hydrological process using a
perceived system (such as a series of interconnected stores and flow pathways).
Parametric models use mathematical transfer functions (such as multiple linear
regression equations) to relate meteorological variables to runoff.

© Arnold 2001 0309–1333(01)PP287RA

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 81

Hydrological models are further classified as either lumped or distributed (Todini,


1988). Lumped (or homogeneous) models treat the catchment as a single unit. They
provide no information about the spatial distribution of inputs and outputs, and
simulate only the gross, spatially averaged response of the catchment. Conversely,
distributed (or heterogeneous) models represent the catchment as a system of inter-
related subsystems – both vertically and horizontally. Thus, distributed models can be
considered as an assemblage of subcatchments arranged either in series or as a
branched network (O’Loughlin et al., 1996).
According to these criteria, artificial neural networks (ANNs) should be classified as
parametric models that are generally lumped. This is because neural network engineers
or ‘neurohydrologists’ (as they have been termed by Abrahart et al., 1998) regard the
rainfall–runoff process as a ‘black box’ system with inputs (e.g., antecedent rainfall and
flow) and outputs (usually flow). Consequently, ANN usage does not presuppose a
detailed understanding of a catchment’s physical characteristics, nor does it require
extensive data preprocessing. This is because ANNs can, theoretically, handle
incomplete, noisy and ambiguous data (Maier and Dandy, 1996a). Furthermore, ANNs
are often cheaper and simpler to implement than their physically based counterparts
(Campolo et al., 1999). They are also well suited to dynamic problems and are parsimo-
nious in terms of information storage within the trained model (Thirumalaiah and Deo,
1998a).
Superficially, ANNs provide a novel and appealing solution to the problem of
relating input and output variables in complex systems. This has led to the application
of ANNs in many fields, including financial management, manufacturing, control
systems, design, environmental science and pattern recognition in, for example, remote
sensing (for geographical examples, see Hewitson and Crane, 1994; for hydrological
examples, see Govindaraju and Ramachandra Rao, 2000). Despite these advantages, the
field has become characterized by a lack of consistency of approach and poor modelling
practice. This situation has arisen because the choice of network type, training
method(s) and data handling technique(s) has often been undertaken in unsystematic
ways by neurohydrologists.
Accordingly, this review introduces some of the most promising techniques for ANN
rainfall–runoff modelling and outlines a number of caveats for the development of
robust ANN models. Section II provides a general introduction to artificial neural
networks, including a description of the most commonly used architectures. Sections III
and IV discuss two important issues faced by neurohydrologists: respectively, data
handling, and the selection of appropriate performance measures to compare different
hydrological models (including the more specific case of ANN runoff models). Section
V presents a survey of recent approaches to hydrological modelling using ANNs and
stresses the need for more model intercomparison studies. On the basis of the preceding
comments, section VI outlines a framework for implementing ANN rainfall–runoff
models and identifies priorities for future research. Finally, a glossary of neural network
terms is presented in the Appendix.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
82 Hydrological modelling using artificial neural networks

II Artificial neural networks

The rainfall-runoff process has been described quantitatively since the nineteenth
century. However, it is only in the last decade or so that ANNs have been applied to the
problem. None the less, ANNs have been in existence since McCulloch and Pitts (1943)
introduced the concept of the artificial neuron. Since that time neural network research
has evolved in three distinct phases (Schalkoff, 1997). The first era involved preliminary
work on the development of the artificial neuron until Minsky and Papert (1969)
identified several limiting factors. The second era began with the rediscovery and pop-
ularization of the backpropagation training algorithm (Rumelhart and McClelland,
1986). Prior to this seminal work it was very difficult to train neural networks of any
practical size. The third era is characterized by more rigorous assessments of network
limitations and generalizations, fusion with other technologies (such as genetic
algorithms and fuzzy logic) (e.g., See and Openshaw, 1999) and the implementation of
ANNs using dedicated hardware.
The following sections provide an overview of ANNs, including the main structures,
network types and training algorithms. In recognition of the unfamiliar terminology
employed, a glossary of ANN terms is provided in the Appendix.

1 Applying neural networks


The effective application of ANNs requires an appreciation of the relative merits of the
different networks available, as well as an understanding of the best ways to train them.
Network types and training algorithms are constantly evolving and the neurohydrolo-
gist must keep abreast of such developments. For example, support vector machines are
an area of current interest within the wider neural network research community (e.g.,
Haykin, 1999), although these tools have yet to be applied to rainfall–runoff modelling.
When applying neural networks to rainfall–runoff modelling a number of decisions
must be made. First, one must choose an appropriate neural network type. Secondly,
one must choose an appropriate training algorithm, select suitable training periods and
determine an appropriate network structure. Thirdly, one must decide how to pre- and
post-process input–output data. While some of these operations may be automated
using appropriate modifications to training algorithms, many decisions must still be
made through a process of trial and error. A full discussion of these topics is beyond the
scope of this article and interested readers are directed towards texts such as Bishop
(1995).

2 Network architectures
ANNs may be described as a network of interconnected neurons (sometimes called
nodes). Figure 1 presents the structure of an individual neuron. Each neuron consists
of a number of input arcs (stemming from other neurons or from outside the network;
u1 . . un) and a number of output arcs (which in turn lead to other neurons or to the
‘outside world’). A neuron computes an output, based on the weighted sum of all its
inputs (Sj), according to an activation function (f(Sj)). These activation functions may be
logistic sigmoid (see Figure 1), linear, threshold, Gaussian or hyperbolic tangent

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 83

Figure 1 Activation of a single neuron

functions, depending on the type of network and training algorithm employed. In the
majority of studies the logistic sigmoid function or hyperbolic tangent functions are
used. The logistic sigmoid activation function (Equation 1) – in which x represents the
weighted sum of inputs to the neuron and f(x) the neuron’s output – is often used
because it is continuous and relatively easy to compute (as is its derivative). It maps the
outputs away from extremes, and it introduces nonlinear behaviour to the network:
1
f(x) = (1)
1 + e–x
In feed-forward networks the connections between neurons flow in one direction: from
an input layer, through one or more hidden layers, to an output layer (see Figure 2).
While some studies direct predicted output back to the input side of a network to make
further predictions (e.g., Cheng and Noguchi, 1996), strictly speaking they are still feed
forward as only one forward pass is made through the network for each prediction.
Two feed-forward network types have been widely used to model the rainfall–runoff
process: the multilayer perceptron (MLP) and the radial basis function (RBF). These
networks typically consist of three or four connected layers of neurons (as shown in
Figure 2). The number of neurons in the input and output layer is specified by the
problem to which the network is applied (i.e., the number of predictors and
predictands, respectively). The neurohydrologist must specify the number of hidden
layers and neurons in each hidden layer. If there are too few neurons in the hidden
layers, the network may be unable to describe the underlying function because it has
insufficient parameters (or ‘degrees of freedom’) to map all points in the training data.
Conversely, if there are too many neurons, the network has too many free parameters
and may overfit the data, losing the ability to generalize. In addition, an ‘excessive’
number of hidden neurons can retard the training process to such an extent that it takes
an inordinate length of time for a network to learn.
It is possible to determine an ‘optimum’ number of neurons in the hidden layer(s)
during training by pruning out extraneous hidden nodes from a complex network.
Pruning algorithms, such as skeletonization and magnitude-based pruning (which

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
84 Hydrological modelling using artificial neural networks

Figure 2 A feed-forward artificial neural network structure

removes unwanted links rather than unwanted nodes), were evaluated by Abrahart et
al. (1998). An alternative approach is to add links and hidden nodes to a simple network
until convergence occurs – for example, using cascade correlation (Kwok and Yeung,
1997). Hirose et al. (1991) introduced a technique that combined these two ideas by pro-
gressively adding or removing nodes from a network during training until an optimum
structure is found. However, pruning and constructive algorithms can retard training
by introducing additional computations. Shamseldin (1997) claims that the best way to
determine an appropriate number of neurons in the hidden layer is via trial and error
and this remains one of the most popular solutions. For a more thorough discussion of
‘optimum’ network geometries, see Huang and Huang (1991) or Maier and Dandy
(1998).
Inputs to the network (predictors) are passed from the input layer of neurons,
through the hidden layer(s) of neurons, to the output layer (see Figure 2) where they
become predictands. Neurons in the input layer do no more than disperse all predictors
to each neuron in the hidden layer. The network operates by applying weights to values
as they pass from one layer to the next and calculating outputs for each of the neurons
in all other layers.

3 Training
A neural network is trained by adjusting the weights that link its neurons. This is
accomplished by presenting the network with a number of training samples (a
calibration data set), each one of which consists of a specific input pattern and corre-

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 85

sponding ‘correct’ output response. Depending on the nature of the training algorithm
used, it may be necessary to present the network with the calibration data repeatedly (a
number of epochs) until the underlying function is ‘learned’. However, care must be
taken to ensure that the network does not become overfamiliarized with the calibration
data and thus lose its ability to generalize to problems it has not yet encountered.
Various techniques may be employed to avoid over training, including regularization
theory, which attempts to smooth network mappings (Bishop, 1995), and cross-
validation using an independent test set (Braddock et al., 1998).

4 The multilayer perceptron (MLP)


The MLP is the most popular neural network architecture in use today. In the majority
of studies the MLP is trained using the error backpropagation algorithm. This popular
algorithm works by iteratively changing a network’s interconnecting weights such that
the overall error (i.e., between observed values and modelled network outputs) is
reduced. This is achieved by searching the network’s ‘weight space’ or error function.
The error function is an error surface in n-dimensional space corresponding to a
mapping of the network’s weight vector to the network’s overall error. The purpose of
training is to search this error surface by adjusting a network’s weights, such that an
acceptable error minimum is reached. Training is initiated from a randomly determined
region on the error surface. The algorithm then proceeds by directing weight changes
down error gradients based on the first-order derivative of the error function (following
gradients on the error surface of steepest descent). Step changes are made to weights as
each training example is presented to the network (an epoch). The ‘training rate’
parameter affects the size of step taken through weight space at each training iteration.
If the rate is too large, training can oscillate from one nonoptimal set of weights to
another. If the rate is too small, training may be trapped in a local error minimum or
suboptimal solution.
The error backpropagation algorithm can be adapted in two ways. First, momentum
(which keeps weight changes on a faster, more even path and helps to avoid local
minima) can be used in an attempt to speed convergence to an error minimum (Gallant,
1993). Momentum is controlled using a ‘momentum rate’ which must be less than unity
for convergence. Secondly, the training rate can be adjusted dynamically to prevent the
optimization process becoming caught in a local error minimum (see Dawson, 1996; Dai
and MacBeth, 1997; Magoulas et al., 1997).
There are many alternative methods for seeking the minimum of the error function
by adjusting the weight values. For example, second-order weight updates (e.g., Maier
and Dandy, 2000); quick propagation (Fahlman, 1988; Bishop, 1995); line search
algorithms, such as conjugate gradients (Bishop, 1995) or Newton’s method (Battiti,
1992; Bishop, 1995), a technique that identifies both weights and structure simultane-
ously; and linear least squares simplex (LLSSIM) (Hsu et al., 1995), and genetic
algorithms (Yang, 1997) which can be used to determine an optimum network
geometry, an optimum set of weights, or both. Thirumalaiah and Deo (1998a) compared
the results of an MLP rainfall–runoff model calibrated using three different training
algorithms (error backpropagation, conjugate gradients and cascade correlation).
Although cascade correlation produced rather poor results, little difference was found

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
86 Hydrological modelling using artificial neural networks

in network performance between the backpropagation and conjugate gradient training


approaches.

5 The radial basis function (RBF)


The RBF has been used in comparatively few rainfall–runoff studies (e.g. Mason et al.,
1996; Jayawardena et al., 1997; Jayawardena and Fernando, 1998; Dawson and Wilby,
1999). While the structure of the RBF is identical to the MLP, the RBF simulates the
unknown rainfall–runoff function using a network of Gaussian basis functions in the
hidden layer (Equation 2) and linear activation functions in the output layer. In
Equation 2, x represents the weighted sum of inputs to the neuron, σ is the sphere of
influence or width of the basis function, and f(x) is the corresponding output from the
neuron:
2/2σ2
f(x) = e–x (2)
Training an RBF involves two stages. First, the basis functions are established using an
algorithm to cluster data in the training set. Kohohen self-organizing maps (SOMs) or
a k-means clustering algorithm are most commonly applied. Kohohen SOMs (Kohohen,
1984; 1990) are a form of ‘self-organizing’ neural network that learn to distinguish
patterns within input data. A SOM will, therefore, cluster input data according to
perceived patterns without having to be given a corresponding output response. K-
means clustering involves sorting all objects into a predefined number of groups by
minimizing the total squared Euclidean distance for each object with respect to its
nearest cluster centre. However, other techniques, such as orthogonal least squares and
MaxiMin algorithms, have also been used (Song, 1996). Secondly, the weights linking
the hidden and the output layer are calculated directly using simple matrix inversion
and multiplication. The direct calculation of weights in an RBF makes it far quicker to
train than an equivalent MLP.

III Data handling for ANNs

Earlier discussion identified one of the main strengths of neural networks, namely, their
ability to handle incomplete, noisy and nonstationary data (Zealand et al., 1999).
However, with suitable data preparation beforehand, it is possible to improve the
performance of a neural network still further (Masters, 1995). Data preparation involves
a number of processes such as data ‘cleansing’, determining appropriate predictors
(using data reduction techniques), standardizing/normalizing the data and, finally,
dividing the data into calibration and test sets.

1 Data ‘cleansing’
Data cleansing involves identifying and removing trends and nonstationary
components (in terms of the mean and variance) within a data set. Cycles and seasonal
fluctuations should also be identified and removed. For example, trends can be
removed by differencing the time series, and the data can be centred using rescaling

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 87

techniques. It is also possible to filter the data to extract underlying, important sources
of information and suppress troublesome noise (Masters, 1995). To date, data cleansing
techniques have not been widely applied in ANN rainfall–runoff modelling, so there is
much scope for development in this area (Maier and Dandy, 1996b).

2 Determining appropriate inputs/outputs


While the above techniques translate data into suitable forms for use in neural network
modelling, it is still necessary to identify the most powerful inputs and outputs for the
models. The majority of studies focus on predicting flow (as either discharge or stage)
using antecedent or concurrent catchment conditions. In this case the ANN is
attempting to model a process of the form:
Qt = f(Qt–n, Rt–n, Xt–n)
in which Qt is current flow, Qt–n is antecedent flow (at t–1, t–2, . . . t–n time steps), Rt–n
is antecedent rainfall (at t–1, t–2, . . . t–n), and Xt–n represents any other factors
identified as affecting Qt (e.g., year type [e.g., wet or dry], Tokar and Johnson, 1999;
percentage impervious area, Minns, 1996; storm occurrence, Dawson and Wilby, 1998).
The neurohydrologist must first establish the optimal lag-interval between process
and response. Autoregressive moving average (ARMA) models are often used to
determine appropriate variables, lead times and the windows for averaging (Maier and
Dandy, 2000). Alternatively, correlation testing may be used to identify the strongest
causal relationships from a set of possible predictor variables (as in Dawson and Wilby,
1998). The chosen predictor variables are then applied as either individual inputs to
multiple nodes (e.g., predictors are Qt–1, Qt–2, Qt–3 etc.) and/or as lumped averages (in
which case an input node receives a moving average).
If the available data contain many input variables but few points, it is important to
attempt some form of data reduction for the input data. Without this the model will
have more free parameters to establish than data to constrain individual parameter
values. Data reduction techniques might involve statistical manipulations, such as
extracting principal components (e.g., Masters, 1995), or reducing physical data sets, by
averaging rainfall data from several rain gauges (e.g., Chang and Hwang, 1999).
Appropriate outputs must also be identified. Some authors use changes in flow rather
than flow per se to reduce the likelihood of extrapolating beyond the range of the
calibration data (e.g., Minns and Hall, 1997). Others do not predict flow directly but
rather the parameters of a Fourier series which is, in turn, used to model flow (e.g.,
Atiya et al., 1999).

3 Standardization/normalization
All variables should be standardized to ensure they receive equal attention during the
training process (Maier and Dandy, 2000). This is particularly important in RBF
networks where cluster centres would be dominated by high-magnitude input
variables. Without standardization in MLPs, input variables measured on different
scales will dominate training to a greater or lesser extent because initial weights within
a network are randomized to the same finite range.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
88 Hydrological modelling using artificial neural networks

Data standardization is also important for the efficiency of training algorithms. For
example, the gradient descent algorithm (error backpropagation) used to train the MLP
is particularly sensitive to the scale of data used. Due to the nature of this algorithm,
large values slow training because the gradient of the sigmoid function at extreme
values approximates zero (see Figure 1). To avoid this problem, data are rescaled using
an appropriate transformation. In general, data are rescaled to the intervals [–1, 1], [0.1,
0.9] or [0, 1] (referred to as standardization). Another approach is to rescale values to a
Gaussian function with a mean of 0 and unit standard deviation (referred to as normal-
ization). The advantage of using [0.1, 0.9] for runoff modelling is that extreme (high and
low) flow events, occurring outside the range of the calibration data, may be accom-
modated (Hsu et al., 1995). Alternatively, changes in flow rather than absolute flows
may be used to avoid the problem of saturation, but Minns and Hall (1997) reported
only limited gains from this approach. Other authors advocate [0.1, 0.85] (e.g.,
Shamseldin, 1997), or [–0.9, 0.9] (e.g., Braddock et al., 1998).

4 Model calibration and validation


Ideally, three data sets should be used for a rigorous analysis of ANN skill: a calibration
set, a test set and a validation set (called cross-validation). The calibration set is used to
train a number of different ANN model configurations. The test set is used to decide
when to stop training (to avoid overfitting) and also to determine which of the
networks is the most accurate. Finally, the validation set is used to evaluate the chosen
model against independent data. However, Lachtermacher and Fuller (1994) identify a
number of problems when using three data sets. First, if there are limited data available
it can be impractical to create three independent data sets. Secondly, the method of
dividing the data can significantly affect the results. Thirdly, when using a test set to
cease training, it is not always clear when a network is beginning to ‘learn’ the noise
inherent to the time series.
With finite data availability it is often most prudent to use a cross-training technique.
This method involves splitting the available data into S equal-sized segments. Network
models are then calibrated using all the data in S–1 of these segments and validated on
the remaining segment of independent data. The procedure is repeated S times so that
S models are calibrated and validated for each model type and configuration. This
ensures that each data segment is used only once for validation. Thus, when the
validation segments are recombined one has a validation set equal to the entire data set.
Typical values for S are 5 and 10 segments (Schalkoff, 1997). An alternative is to use the
hold-one-out or jackknife method, in which S = n – 1 (where n is the number of data points
in the entire data set). Thus, for a data set containing n data points, one would have to
create and validate n neural networks.

IV Model assessment

There is a general lack of objectivity and consistency in the way in which rainfall–runoff
models are assessed or compared (Legates and McCabe, 1999). This also applies to the
more specific case of ANN model assessment and arises for several reasons. First, there

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 89

are no standard error measures (although some have been more widely applied than
others). Secondly, the diversity of catchments studied (in terms of area, topography,
land use, climate regime, etc.) hinders direct comparisons. Thirdly, different aspects of
flow may be modelled (e.g., discharge, stage, rates of change of discharge, etc.). Finally,
there are broad differences between studies with respect to lead times (ranging from 0
to +24 model time steps) and the temporal granularity of forecasts (from seconds to
months).
When artificial neural networks are trained using algorithms such as backpropaga-
tion they are generally optimized in such a way as to minimize their global error. While
this is a useful general target, it does not necessarily lead to a network that is proficient
for both low flow and flood forecasting. The squared error, which is used in many
training algorithms, does provide a general measure of model performance, but it does
not identify specific regions where a model is deficient. Other error measures are,
therefore, employed to quantify these deficiencies (see the review of Watts, 1997).
The most commonly employed error measures are: the mean squared error (MSE),
the mean squared relative error (MSRE), the coefficient of efficiency (CE), and the
coefficient of determination (r2) (see Equations 3, 4, 5, 6 respectively);
n
∑ (Q – Q̂ ) i i
2
i=1 (3)
MSE =
n

n (Qi – Q̂ i)2
∑ Qi2
i=1 (4)
MSRE =
n

n
∑ (Q – Q̂ ) i i
2
i=1 (5)
CE = 1 – n
∑ –
(Qi – Q )2
i=1

2
n
∑ (Q – Q–)(Q̂ – Q~)
i i
i=1
r2 = n n (6)
∑ (Q – Q–) ∑ (Q̂ – Q~)
i
2
i
2
i=1 i=1


where Q̂ i are the n modelled flows, Qi are the n observed flows, Q is the mean of the
~
observed flows, and Q is the mean of the modelled flows.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
90 Hydrological modelling using artificial neural networks

According to Karunanithi et al. (1994), squared errors (MSE) provide a good measure
of the goodness of fit at high flows, whilst relative errors (MSRE) provide a more
balanced perspective of the goodness of fit at moderate flows. However, these measures
are strongly affected by catchment characteristics and care must be taken when
comparing studies using these statistics.
CE and r2, on the other hand, provide useful comparisons between studies since they
are independent of the scale of data used (i.e., flow, catchment, temporal granularity,
etc.). They are correlation measures that measure the ‘goodness of fit’ of modelled data
with respect to observed data. CE is referred to by some authors as the determination
coefficient (e.g., Cheng and Noguchi, 1996), the efficiency index, E (Abrahart and Kneale,
1997; Sureerattanan and Phien, 1997), F index (Minns and Hall, 1996), and R2 (e.g., Nash
and Sutcliffe, 1970). Care must be taken not to confuse R2 with the coefficient of deter-
mination, r2, which some authors also refer to as R2 (e.g., Lorrai and Sechi, 1995;
Furundzic, 1998; Legates and McCabe, 1999).
The CE statistic provides a measure of the ability of a model to predict flows which
are different from the mean (i.e., the proportion of the initial variance accounted for by
the model; Nash and Sutcliffe, 1970), and r2 measures the variability of observed flow
that is explained by the model (see the evaluation of Legates and McCabe, 1999). CE
ranges from –∞ at the worst case to +1 for a perfect correlation; r2 ranges from –1
(perfect negative correlation), through 0 (no correlation), to +1 (perfect positive
correlation). According to Shamseldin (1997) a CE of 0.9 and above is very satisfactory,
0.8 to 0.9 represents a fairly good model, and below 0.8 is deemed unsatisfactory.
Legates and McCabe (1999) highlight a number of deficiencies with relative measures
such as CE and r2. They note that r2 is particularly sensitive to outliers and insensitive
to additive and proportional differences between modelled and observed data. For
example, a model could grossly, but consistently, overestimate the observed data values
and still return an acceptable r2 statistic. Although CE is an improvement over r2 (in that
it is more sensitive to differences in modelled and observed means and variances) it is
still sensitive to extreme values. The index of agreement measure, d (Equation 7) has been
proposed as a possible alternative (Legates and McCabe, 1999) but it is still sensitive to
extreme values, owing to the use of squared differences. Modified versions of d and CE
have also been described which are both baseline adjusted (adjusted to the time series
against which the model is compared) and adapted from squared to absolute
differences. The second adaptation reduces the sensitivity of these measures to outliers.
The interested reader is directed towards Legates and McCabe (1999) for a more
thorough discussion.
n
∑ (Q – Q̂ ) i i
2
i=1
d=1– (7)
n
∑ (|Q̂ – Q–|+|Q – Q–|)
i i
2
i=1

Another error measure that has been used is S4E (presented as MS4E in Equation 8) by
Abrahart and See (1998). This higher-order measure places more emphasis on peak

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 91

flows than the lower-order MSE. Alternatively, the mean absolute error (MAE, Equation
9), which computes all deviations from the original data regardless of sign, is not
weighted towards high flow events:
n
∑ (Q – Q̂ ) i i
4
i=1 (8)
MS4E =
n

n
∑ |Q – Q̂ | i i
i=1 (9)
MAE =
n

Other measures that have been employed in only a limited number of cases include
RMSE/µ (RMSE as percentage of observed mean; Jayawardena et al., 1997; Fernando
and Jayawardena, 1998); %MF, the percent error in modelled maximum flow relative to
observed data (Hsu et al., 1995; Furundzic, 1998); %VE, the percent error in modelled
runoff volume (Hsu et al., 1995); and %NRMSE, the percentage of values exceeding the
RMSE (Campolo et al., 1999). An RMS normalized error was used by Atiya et al. (1996)
and is defined as the square root of the sum squared errors divided by the square root
of the sum squared desired outputs.
Lachtermacher and Fuller (1994) identify other measures for time series analysis such
as the average relative variance (Nowlan and Hinton, 1992) and mean error (Gorr et al.,
1992). Another measure often used in time series analysis is Theil’s U-statistic (Theil,
1966), which provides a relative basis for comparing complex and naive models.
However, these measures have yet to be used in the evaluation of ANN rainfall–runoff
models.
Classification approaches are also used to evaluate predictive models. For example,
Colman and Davy (1999) used a classification technique to evaluate seasonal weather
forecasts. In this technique the observed data were assigned to one of three equiproba-
ble sets, or terces (in this case, below-average, average and above-average tempera-
tures). Model skill (relative to chance) is then assessed using a chi-square test of the
modelled versus expected frequencies in each category. Similarly, Abrahart and See
(1998) classified predictions according to % correct; % under predictions within ±5, 10,
25% of observed; and % predictions greater than ±25% of observed. This allows direct
comparisons to be made between different models irrespective of the predictand and
model time step.
While the above discussion relates more generally to rainfall–runoff modelling, flood
forecasting systems need to employ additional error measures. For example, P–P
(Dawson et al., 2000) is a measure of the error in the timing of a predicted flood
peak (Chang and Hwang, 1999, refer to this as ETp). Abrahart and See (1998) use
MAEpp and RMSEpp which measure equivalent values to MAE and RMSE for all flood
events in a data set. They also employed a classification criteria which measures
% early, % late and % correct occurrences of individual predicted peaks (although
they do not indicate what discrepancy constitutes a ‘late’ peak). A further measure
used for flood forecasting is total volume but this measure provides no indication of

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
92 Hydrological modelling using artificial neural networks

temporal accuracy (Zealand et al., 1999).


The measures introduced above take no account of the parsimony of the models. One
would expect a model with many parameters to ‘fit’ data better than one with fewer
degrees of freedom. However, more complex models do not necessarily lead to propor-
tionate increases in accuracy and one must question whether the additional effort is
justifiable. Fortunately, several performance measures take into account the number of
parameters used in a model. For example, the A information criteria (AIC) (Akaike,
1974); the B information criteria (BIC) (Rissanen, 1978); the Schwarz information criteria
(SIC) (Schwarz, 1978); the Vapnik–Chervonenkis dimension (Abu-Mostafa, 1989); or the
network information criteria (NIC) (Murata et al., 1994). The AIC and BIC measures are
defined as follows:
AIC = m ln(RMSE) + 2p (10)
BIC = m ln(RMSE) + p ln(m) (11)
in which m is the number of data points and p is the number of free parameters in the
model. These measures take into account the number of parameters used within a
model and give credit to models that are more parsimonious.
Given the wide array of performance measures, the problem then becomes deciding
which (if any) are most appropriate to a particular application. For example, Figure 3
shows different types of model error produced by four hypothetical rainfall–runoff
models. Model A, which is somewhat naive, predicts the shape of the hydrograph well
but consistently overestimates flow and predicts the flood peak late. Model B predicts
low flows accurately but returns poor estimates of the flood peak. Model C simulates
flow generally well but contains a lot of ‘noise’, and model D reproduces flood events
very well but performs poorly for low flows. Table 1 reports the error measures
associated with each model. Model B may be selected in preference to model D based
on the MSRE or MAE statistic. However, model D would be selected in preference to
model B from the RMSE, CE, d and r2 statistics. Model C consistently outperforms all
other models based on the error statistics, but it is not as accurate as model B during
low flow periods, or model D during flood events. Model A appears relatively weak
when assessed using most of the error statistics, but it performs very well according to
r2. This echoes the results of Legates and McCabe (1999) who point out the fallibility of
the r2 statistic (which does not penalize additive and proportional differences).
The results in Table 1 emphasize the importance of not relying on individual error
measures to assess model performance. Thus, goodness of fit error measures (e.g., CE,
d, and r2) and absolute error measures (RMSE and MAE) should be used in combination
(Legates and McCabe, 1999). Scatter plots also provide a useful visual aid to assess a
model’s accuracy. For example, Figure 4 shows a series of scatter plots comparing
observed flow in the River Mole (UK) with four models (MLP, RBF, zero-order forecast
and linear regression model) calibrated using data from winter and early spring
(Dawson and Wilby, 1999). The closer the scattered points are to the line of best fit, the
better the model. In this case the MLP model appears to be the most accurate. A useful
outcome of this kind of plot is that any heteroscedasticity within the model (i.e.,
changes in variance across the range of values) is readily exposed.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 93
Figure 3 Hydrographs of four hypothetical models
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
94 Hydrological modelling using artificial neural networks

Table 1 Error measures of four hypothetical models (see also Figure 3)

MSRE RMSE r2 CE (%) d MS4E MAE

Model A 0.0510 61.24 0.827 44 0.871 215 56


Model B 0.0243 72.28 0.397 21 0.558 2003 34
Model C 0.0123 29.78 0.885 87 0.968 14 26
Model D 0.0430 50.89 0.785 61 0.922 152 39

V Survey of current ANN modelling practice

As noted at the outset, ANN application to hydrological modelling is a small but


rapidly expanding field of research. In order to determine the preferred modelling
practices, over 50 articles were surveyed and their details summarized in Table 2 (see
the Appendix for further explanation of terms). The table follows the convention of
Maier and Dandy (2000) and is representative of research output since 1993.
Table 2 highlights the range of approaches implemented within the relatively limited
(published) field of ANN runoff modelling. While the majority of articles model flow
directly, others adopt a more indirect approach (see, for example, Pankiewicz, 1997, or
Atiya et al., 1999). Approximately half the studies (47% of the articles) attempt to predict
discharge, while a quarter (23%) model stage. Others are less explicit about what is
modelled and are recorded in the table as flow. In some studies, ANN rainfall–runoff
models have been used to evaluate other hydrological processes. For example, Clair
and Ehrman (1996) evaluate climate change impacts, and Abrahart et al. (1999) use
ANNs to assess the choice of predictor variables on simulated flow. More recent studies
embed ANNs within larger, hybrid systems (e.g., See and Abrahart, 1999), or explore
methods of combining ANNs using other artificial intelligence techniques (e.g., See and
Openshaw, 1999).
The chosen time steps range from seconds to years, but the preferred interval was
hourly (38%). The number of lead steps vary from 0 to +24, although the most favoured
lead is one step (45%). Catchment areas vary considerably and are drawn from all over
the world, spanning ten orders of magnitude from small synthetic watersheds (< 27 ×
10–4 km2; Hall and Minns, 1993) to the River Nile (29 × 106 km2; Tawfik et al., 1997).
Most studies provide no details of how the neural networks were developed, so it
must be assumed that in many cases the neurohydrologists developed their own
program using a high-level language. In other cases, ‘off-the-shelf’ packages were used.
For example, the Stuttgart Neural Network Simulator was used in nine of the surveyed
studies, and the BP-Simulator by IBP-Pietzsch GmbH, Ettlingen, Germany, in two
others.
Most studies generally divide the data into just calibration and validation sets. Only
four studies applied the more rigorous approach of training with calibration and test
data before evaluation (cross-validation). Data sets varied between 100 and 9000 data
points. However, there was a general lack of information on how data were obtained,
analysed or divided. Only half the articles surveyed described the method of data
normalization/standardization, with the majority rescaling to either [0,1] or [0.1, 0.9].
The most popular ANN used for modelling the rainfall–runoff process was the MLP
(89%). In the majority of cases (64%) the logistic sigmoid function was used as the

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008

C.W. Dawson and R.L. Wilby


Figure 4 Scatter plots of observed versus modelled flow for the River Mole, winter/early spring 1994
(Dawson and Wilby, 1999) MLP = multi-layer perception; RBF = radial basis function; SWMLR = stepwise
multiple linear regression; zero-order forecast = naive lagged model.
Hydrology and earth system sciences, the hydrological research journal of the European Geophysical Society,

95
Burcot Tower, Burcot, Abingdon, Oxfordshire, OX14 3DJ, UK, 29 November 2000.
96 Hydrological modelling using artificial neural networks

neuron activation function, while the most popular second choice was the hyperbolic
tangent function (13%). Of the articles presented in Table 2, only five used the
alternative RBF network and just one (Chang and Hwang, 1999) used a GMDH (group
method of data handling) network structure. Although some authors claim to use
recurrent networks, all networks reviewed were in fact feed forward (see section II).
Network architectures were generally optimized using a trial and error approach
(51%). Some studies select the network architecture based on experience from earlier
work; others use optimization algorithms such as cascade correlation, genetic
algorithms, magnitude pruning and skeletonization (e.g., Karunanithi et al., 1994;
Abrahart et al., 1998). When network architectures were configured, the majority of
studies used one hidden layer (70%). Others experimented with two hidden layers and
Sajikumar and Thandaveswara (1999) experimented with three. In the majority of cases
(68%) training was performed using standard error backpropagation. However,
relatively few articles discussed how this was implemented (32%) or included
information on the learning or momentum parameters. In those studies that
implemented RBF networks, four of the five used a K-means clustering algorithm to
determine the basis function centres, while the remaining article (Fernando and
Jayawardena, 1998) used an orthogonal least squares technique. Other variations to
training involve improvements to the standard backpropagation algorithm (for
example, using conjugate gradients), or alternative techniques such as quick
propagation, linear least squares simplex (Hsu et al., 1995) and a temporal backpro-
pogation algorithm (Sajikumar and Thandaveswara, 1999). Discussion of the number of
epochs performed during training and of the stopping criteria was largely absent.
Those articles that reported this information generally specified the number of training
cycles beforehand.
The literature survey also revealed a notable lack of contributions in which different
ANN configurations were compared or, perhaps more critically, assessed relative to
more conventional statistical approaches. For example, Dawson and Wilby (1999)
compared the cross-validation results of two ANN models (the MLP and RBF) with a
stepwise multiple linear regression model (SWMLR) and zero-order forecasts (ZOF) of
river flow, given 15-minute rainfall–runoff data for the River Mole (a flood-prone
tributary of the River Thames, UK). Using only antecedent rainfall and discharge mea-
surements, the four models were used to forecast river flows with 6-hour lead time and
15-minute resolution. Figure 4 compares the observed versus forecasted flows for
winter/early spring 1994, and Figure 5 shows the corresponding hydrographs for a two
week subset of the same data. Overall, the MLP was more skilful than the RBF, SWMLR
and ZOF models. However, according to performance measures such as the RMSE,
MSRE, CE and r2, the RBF flow forecasts were only marginally better than those of the
simpler SWMLR and ZOF models. This result suggest that ANNs should be regarded
as an alternative to more traditional rainfall–runoff methods rather than a replacement
(Maier and Dandy, 2000). Clearly, many more studies of this type are required before
the optimal model configurations and/or circumstances for ANN flow-forecasting can
be established firmly.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008

C.W. Dawson and R.L. Wilby


Figure 5 Hydrographs of observed versus modelled flow for the River Mole (Dawson and Wilby, 1999) (see
Figure 4 for explanation of models).
Hydrology and earth system sciences, the hydrological research journal of the European Geophysical Society,

97
Burcot Tower, Burcot, Abingdon, Oxfordshire, OX14 3DJ, UK, 29 November 2000.
98
Hydrological modelling using artificial neural networks
Table 2 Details of studies reviewed
Reference Time step Lead steps Variable Location Catchment Hidden ANN/training
area (km2) layers
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

1 Abrahart and Kneale (1997) Hour [0, +1, +12] Flow Wye, Wales 10.55 4 MLP/BP
2 Abrahart and See (1998) Hour +1 Stage, change Wye, Ouse, UK 10.55, 3286 3, 4 MLP/BP
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008

3 Abrahart et al. (1998) Hour +1 Stage Wye, Wales 10.55 4 MLP/BP


4 Abrahart et al. (1999) Hour +1 Normalized flow Wye, Wales 10.55 4 MLP/BP
5 Atiya et al. (1996) 10 days, 1 month +1 Discharge Nile, Egypt ? 3 MLP/BP
6 Atiya et al. (1999) 10 days, 1 month +1, +2, +3 Discharge, Nile, Egypt ? 3 MLP/BP
change, FC
7 Braddock et al. (1998) Day +1 Discharge Unknown, Australia N/A 3 MLP/BP, CG
8 Campolo et al. (1999) Hour +1, +5, +9 Stage Tagliamento, Italy 2480 3 MLP/BP
9 Chang and Hwang (1999) Hour +1 Discharge Shen-cei Creek, Taiwan 259.2 4 GMDH:
10 Cheng and Noguchi (1996) 12 mins +1 Discharge Ziyang County, China 4.47 3 MLP/BP
11 Clair and Ehrman (1996) Year, month 0 Runoff 15 Canadian rivers ? 3 MLP/BP
12 Crespo and Mora (1993) 10 days 0 Stage Pisuena, Spain 356 4 MLP/BP
13 Danh et al.(1999) Day +1 Discharge Da Nhim, La Nga, 800, 3060 3 MLP/BP
Vietnam
14 Dawson and Wilby (1998) 15 mins +24 Discharge Mole, Amber, UK 142, 139 3 MLP/BP
15 Dawson and Wilby (1999) 15 mins +24 Discharge Mole, UK 142 3 MLP/BP, RBF/KM
16 Fernando and Jayawardena
(1998) Hour +1 Discharge Experimental, Japan 3.12 3 MLP/BP, RB/OLS
17 Furundzic (1998) Day +1 Runoff Lim, Yugoslavia 3160 3 MLP/BP
18 Golob et al. (1998) Hour +2, +4, +6 Inflow Soca River, Slovenia 1150 3 MLP/BP
19 Hall and Minns (1993) 5, 10 secs, 1 min +0 NFU Synthetic, Cantley 0.05 3 MLP/BP
Estate, UK
20 Hsu et al. (1995) Day +1 Discharge Leaf River, USA 1949 3 MLP/LLSSIM
21 Jayawardena et al. (1997) Hour +2, +5 Stage Shang Qiao, Liu Xie, 110, ? 3 MLP/BP, RBF/KM
China
22 Jayawardena and Fernando Hour +1 Discharge Experimental, Japan 3.12 3 RBF/KM, MLP/BP
(1998)
23 Kang et al. (1993) Hour, day +1 Discharge Chang, Korea ? 3 MLP/BP
24 Karunanithi et al. (1994) Day, 5 day MA +0 Discharge Huron, USA ? 3 MLP/CC
25 Lachtermacher and Fuller ? +1, +n Flow 4 various, USA ? 3 MLP/BP
(1994)
26 Lange (1999) 15 mins ? Unit hydrograph Zeller Bach, Windbach, 20, 34.7 3 MLP/CG
Germany
27 Liong and Chan (1993) N/A N/A Total volume UBT, Singapore 6.11 3 MLP/BP
28 Loke et al. (1997) ? ? Runoff coefficient 42 various; Europe, USA ? 3 MLP/BP
29 Lorai and Sechi (1995) Month +0 Discharge Araxisi, Sardinia 121 3 MLP/BP
30 Mason et al. (1996) Minute +0 Specific yield Synthetic data N/A 3 RBF/KM, MLP/BP
31 Minns (1996) Hour +0 Discharge, change Synthetic, 2 various, UK 24, 31 3 MLP/BP
32 Minns and Hall (1996) Hour +0 Discharge Synthetic ≈ 30 3, 4 MLP/BP
33 Minns and Hall (1997) 30 mins +0 Discharge, change Dollis Brook, Silk 23.99, 31.25 3 MLP/BP
Stream, UK
34 Muttiah et al. (1997) N/A N/A 2-year peak 17 US basins ? 3, 4 MLP/CC&QP
discharge
35 Poff et al. (1996) Day +0 Discharge 2 various, USA 230, 97 3 MLP/BP
36 Raman and Sunilkumar Month +1 Reservoir inflow Bharathapuzha, India ? 3, 4 MLP/BP
© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

(1995)
Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008

37 Sajikumar and Month +0 Stage Lee, UK, Thuthapuzha, 1419, 1030 3, 4, 5 MLP/TBP
Thandaveswara (1999) India
38 See et al. (1997) Hour +6,. +12 Stage Ouse, England 3286 4 MLP/BP
39 See et al. (1998) Hour +1, +6 Stage, change Ouse, Wye, UK 3286, 10.55 –– ––
40 See and Abrahart (1999) Hour +1, +6 Stage, change Ouse, Wye, UK 3286, 10.55 3, 4 MLP/BP
41 See and Openshaw (1998; Hour +6 Stage Ouse, England 3286 3 MLP/CG
1999)
42 Shamseldin (1997) Day +0 Runoff 11 various, worldwide 18000 3 MLP/CG
43 Smith and Eli (1995) ? N/A Peak discharge, FC Synthetic grid N/A 3 MLP/BP
44 Stüber and Gemmer (1997) Hour +6 Stage Mosel, Germany ? 4 MLP/BP
45 Sureerattanan and Phien Day +1 Discharge Mae Klong, Thailand 10880 3 MLP/BP
(1997)
46 Tawfik et al. (1997) Day +0 Discharge White Nile, Egypt ? 3 MLP/BP
47 Thirumalaiah and Deo Hour +1, +2, +3 Runoff Bhasta and Chorna, India 390.86 3 MLP/BP, CG, CC
(1998a)
48 Thirumalaiah and Deo Day +1, +2 Stage Indravathi, India 41700 3 MLP/BP, CG, CC
(1998b)
49 Tokar and Johnson (1999) Day +0 Discharge Little Patuxent, USA 19270 3 MLP/BP

C.W. Dawson and R.L. Wilby


50 Zealand et al. (1999) 1 week [+1, +4] Discharge Winnipeg, Canada 98 3, 4 MLP/BP
51 Zhu et al. (1994) Hour +1, +2, +3 Change Butternut Creek, USA ? 3 MLP/BP

Note: Those values/terms underlined in the table represent the most accurate model configuration in the study. Those items in italics have been inferred. N/A is not
applicable and ‘?’ indicates that the information is unknown. The time step is the temporal granularity used. The number of lead steps is calculated from the most
recent predictor. For example, a model using antecedent flow as a predictor, but also current rainfall, would be recorded as ‘0’ as rainfall is the most recent
predictor. [+x, +y] represents all lead steps between times x and y inclusive.
Abbreviations: BP = backpropagation; CC = cascade correlation; CG = conjugate gradient; Change = rate of change of the predictand; FC = Fourier coefficients;
GMDH = group method of data handling; KM = K-means clustering algorithm; LLSSIM = linear least squares simplex; MA = moving average; MLP = multilayer
perceptron; NFU = normalized flow units; OLS = orthogonal least squares; QP = quickpropagation; RBF = radial basis function network; TBP = temporal
backpropagation.

99
100 Hydrological modelling using artificial neural networks

VI Towards a modelling protocol and future research

Of all the studies evaluated within this survey one factor, above all others, is crucial to
the implementation of an ANN rainfall–runoff model: the availability of suitable, high-
quality data (Smith and Eli, 1995; Tokar and Johnson, 1999). From this point on the
implementation of an effective model is largely dependent on the skill and experience
of the neurohydrologist. To conclude, we propose a template for ANN model
development, and then suggest several areas for future research.

1 Protocol for implementing ANN RR (rainfall–runoff) models


From this study it is clear that no rigorous framework exists for the application of
ANNs to hydrological modelling. Figure 6 provides a conceptual framework of those
stages the neurohydrologist must perform when developing ANN RR (rainfall-runoff
models). In this figure, rectangles with double side bars represent sub-processes which
are further divided. Solid arrows indicate the order in which activities should be
performed and dashed arrows indicate processes that influence others. The following
provides a brief explanation of each of these stages (more detail of the data-preprocess-
ing stages can be found in section III).

Step 1 Gather data: Ensure sufficient data are available for a meaningful study in
terms of both quantity and quality (i.e., information content is paramount) (Tokar and
Johnson, 1999).

Step 2 Select predictand(s): Clearly state the proposed model application,


recognizing that it may be more appropriate to model changes in flow if the data contain
a large variance (and calibration data may be unrepresentative of long-term extremes).
Verify that the data are suitable for such a model.

Step 3 Data preprocessing (stage 1):


3.1 Data cleansing: Remove significant underlying upward or downward trends (for
example, by using first and second differences; Masters, 1995). If necessary,
remove seasonal components (for example, using moving averages; Janacek and
Swift, 1993) and/or filter the data to reduce noise and emphasize the dominant
signal (Masters, 1995).
3.2 Predictors/predictands: Identify the most significant predictors for the chosen
predictand. If necessary, reduce the number of predictors by means of principal
components. Determine suitable lag times for each predictor and calculate
appropriate moving averages for the predictors. This can be achieved through the
use of neural networks (e.g., Furundzic, 1998), ARMA models (Refenes et al., 1997)
or autocorrelation functions. Identify any other processes that might be
significant, for example, storm sequencing (Dawson and Wilby, 1998).

Step 4 ANN selection:


4.1 Network type: Select the most appropriate network type for the application. While
there is no definitive type to choose, in terms of prediction problems both MLP

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 101

Figure 6 Applying ANNs to rainfall–runoff modelling

and RBF neural networks are an appropriate starting point. Begin with an MLP
(trained using standard backpropagation) as this provides a benchmark with
which to evaluate any other models.
Choose appropriate activation function(s) for the neurons. For MLPs generally
use either the logistic sigmoid or hyperbolic tangent functions as a starting point.
For RBFs, Gaussian basis functions are most commonly used.
4.2 Training algorithm: Select a suitable training algorithm to modify weights and
biases, and determine network architecture. Choose appropriate values for
learning parameters (momentum and learning rate) within the range 0.01–0.9.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
102 Hydrological modelling using artificial neural networks

Step 5 Data preprocessing (stage 2):


5.1 Data standardization: According to the algorithm chosen, standardize data to the
ranges [0,1], [–1,1], [0.1, 0.9], etc., or normalize the data.
5.2 Data sets: Create cross-validation data sets by splitting the data into appro-
priate calibration, testing and validation sets (see section III). With a large data
set this is relatively easy as the data can be split into three representative
sets. However, for smaller data sets, cross-training should be used (Schalkoff,
1997).

Step 6 Network training:


6.1 Architecture: Specify the number of hidden layers and number of nodes in these
layers. This may be unnecessary if a pruning algorithm or cascade correlation is
used. Begin with one hidden layer.
6.2 Training: Train a number of networks using the calibration and test data.
Terminate the training process when results from the test data indicate overfitting
to the calibration set.

Step 7 Evaluation: Select error measures (see section IV) that are appropriate to the
model output and purpose. Compare results with those derived from alternative model
configurations.

2 Future directions
From the preceding discussion it is evident that ANN construction involves many
arbitrary decisions, with hitherto little guidance as to the best code of practice or choice
of standard error measures. (In passing, it is conceded that the same criticisms might
also be levelled at the wider discipline of hydrological modelling.) There is also an
urgent need for more inter-model comparisons and rigorous assessment of ANN
solutions versus traditional hydrological methods. Other common failures of existing
ANN modelling practice include: the widespread usage of validation data during the
training process; the arbitrary choice of model inputs, network structures and internal
model parameters; and inadequate preprocessing of model inputs (Maier and Dandy,
2000).
Despite these limitations, there is little doubt (after less than 10 years application) that
ANNs are well suited to the challenging tasks of rainfall–runoff and flood forecasting.
However, future advances in the field will be contingent upon the refinement of
objective guidelines for ANN construction and the development/use of standard
measures of ANN model skill. In this respect, measures of accuracy which penalize
unnecessary model complexity, would greatly enhance model intercomparisons.
Furthermore, indices of catchment properties (such as the mean lag-response between
rainfall and runoff) would enable the comparison of results for different catchments by
acknowledging that a component of model skill is directly attributable to basin
properties (e.g., underlying geology, land use, relief, etc.). Finally, there is considerable
scope for the extraction of hydrological ‘rules’ from the connection weights of trained
ANN models using sensitivity analyses or rule extraction algorithms (e.g., French et al.,
1992; Andrews et al., 1995; Maier et al., 1998; Abrahart et al., 1999). In this way, ANNs

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 103

may provide insights into previously unrecognized relationships within hydrological


‘black boxes’.

Acknowledgements
We thank the anonymous reviewer for constructive comments on our original
manuscript. RW was supported by ACACIA (A Consortium for the Application of
Climate Impact Assessments).

References

Abrahart, R.J. and Kneale, P.E. 1997: Exploring neural networks. In Proceedings, World Congress
neural network rainfall–runoff modelling. In on Neural Networks, San Diego, CA, September,
Proceedings of the 6th British Hydrological Society 461–64.
symposium, Salford University, 9.35–9.44. –– 1999: A comparison between neural-network
Abrahart, R.J. and See, L. 1998: Neural network forecasting techniques – case study: river flow
vs. ARMA modelling: constructing benchmark forecasting. IEEE Transactions on Neural
case studies of river flow prediction. In Networks, 10, 402–409.
Proceedings of the 3rd International Conference on
Geocomputation, University of Bristol, 17–19 Battiti, R. 1992: First- and second-order methods
September (http://www.geog.port.ac.uk/ for learning: between steepest descent and
geocomp/geo98/05/gc_05.htm) (23 February Newton’s method. Neural Computation 4,
2000). 141–66.
Abrahart, R.J. See, L. and Kneale, P.E. (1998) Bishop, C.M. 1995: Neural networks for pattern
New tools for neurohydrologists: using recognition. Oxford: Clarendon Press.
network pruning and model breeding Braddock, R.D. Kremmer, M.L. and Sanzogni, L.
algorithms to discover optimum inputs and 1998: Feed-forward artificial neural network
architectures. In Proceedings of the 3rd Inter- model for forecasting rainfall run-off.
national Conference on Geocomputation, University Environmetrics 419–32.
of Bristol, 17–19 September. (http://www.geog.
port.ac.uk/geocomp/geo98/20/gc_20.htm) Campolo, M. Andreussi, P. and Soldati, A.
(23 February 2000). 1999: River flood forecasting with a neural
–––– 1999: Applying saliency analysis to neural network model. Water Resources Research 35,
network rainfall-runoff modelling. Proceedings 1191–97.
of the 4th International Conference on Chang, F. and Hwang, Y. 1999: A self-organiza-
Geocomputation, Fredericksburg, Virginia, USA, tion algorithm for real-time flood forecast.
25–28 July (http://www.ashville.demon. Hydrological Processes 13, 123–38.
co.uk/geocomp). Cheng, X. and Noguchi, M. 1996: Rainfall–runoff
Abu-Mostafa, Y.S. 1989: The Vapnik– modelling by a neural network approach. In
Chervonenkis dimension: information versus Proceedings of the International Conference on
complexity in learning. Neural Computation, Water Resources and Environmental Research
312–17. 143–50.
Akaike, H. 1974: A new look at the statistical Clair, T.A. and Ehrman, J.M. 1996: Variations in
model identification. IEEE Transactions on discharge and dissolved organic carbon and
Automotive Control. AC-19, 716–23. nitrogen export from terrestrial basins with
Anderson, M.G. and Burt, T.P., editors, 1985: changes in climate: a neural network approach.
Hydrological Forecasting. Chichester: Wiley. Limnology and Oceanography 41, 921–27.
Andrews, R. Diederich, J. and Tickle, A.B. 1995: Colman, A. and Davey, M. 1999: Prediction of
A survey and critique of techniques for summer temperature, rainfall and pressure in
extracting rules from trained artificial neural Europe from preceding winter north Atlantic
networks. Knowledge Based Systems, 8, 373–89. Ocean temperature. International Journal of
Atiya, A. El-Shoura, S., Shaheen, S. and El- Climatology 19, 513–36.
Sherif, M. 1996: River flow forecasting using Crespo, J.L. and Mora, E. 1993: Drought

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
104 Hydrological modelling using artificial neural networks

estimation with neural networks. Advances in editors, 2000: Artificial neural networks in
Engineering Software 18, 167–70. hydrology. Dordrecht: Kluwer Academic.

Dai, H. and MacBeth, C. 1997: Effects of learning Hall, M.J. and Minns, A.W. 1993: Rainfall–runoff
parameters on learning procedure and modelling as a problem in artificial intelligence:
performance of a BPNN. Neural Networks 10, experience with a neural network. In
1505–21. Proceedings of the 4th British Hydrological Society
Danh, N.T. Phien, H.N. and Gupta, A.D. 1999: symposium, Cardiff, 5.51–5.57.
Neural networks for river flow forecasting. Haykin, S. 1999: Neural networks: a comprehensive
Water SA 25, 33–39. foundation (2nd edn). London: Prentice Hall.
Dawson, C.W. 1996: A neural network approach Hewitson, B.C. and Crane, R.G. 1994: Neural nets:
to software project effort estimation. Applica- applications in geography. Dordrecht: Kluwer
tions of Artificial Intelligence in Engineering 1, Academic.
229–37. Hirose, Y., Yamashita, K. and Hijiya, S. 1991:
Dawson, C.W. Brown, M. and Wilby, R. 2000: Back-propagation algorithm which varies the
Inductive learning approaches to number of hidden units. Neural Networks 4,
rainfall–runoff modelling. International Journal 61–66.
of Neural Systems 10, 43–57. Hsu, K. Gupta, H.V. and Sorooshian, S. 1995:
Dawson, C.W. and Wilby, R. 1998: An artificial Artificial neural network modeling of the
neural network approach to rainfall–runoff rainfall–runoff process. Water Resources Research
modelling. Hydrological Sciences Journal 43, 31, 2517–30.
47–66. Huang, S. and Huang, Y. 1991: Bounds on the
–––– 1999: A comparison of artificial neural number of hidden neurons in multilayer
networks used for river flow forecasting. perceptrons. IEEE Transactions on Neural
Hydrology and Earth System Sciences 3, 529–40. Networks, 2, 47–55.

Fahlman, S.E. 1988: Faster-learning variations on Janacek, G. and Swift, L. 1993: Time series
back-propagation: an empirical study. In forecasting, simulation, applications. London: Ellis
Touretzky, D., Hinton, G.E. and Sejnowski, T.J., Horwood.
editors, Proceedings of the 1988 Connectionist Jayawardena, A.W. and Fernando, D.A.K. 1998:
Models Summer School. San Mateo, CA: Morgan Use of radial basis function type artificial
Kaufmann, 38–51. neural networks for runoff simulation.
Fernando, D.A.K. and Jayawardena, A.W. 1998: Computer-Aided Civil and Infrastructure
Runoff forecasting using RBF networks with Engineering 13, 91–99.
OLS algorithm. Journal of Hydrologic Engineering Jayawardena, A.W. Fernando, D.A.K. and Zhou,
3, 203–209. M.C. 1997: Comparison of multilayer
French, M.N., Krajewski, W.F. and Cuykendall, perceptron and radial basis function networks
R.R. 1992: Rainfall forecasting in space and as tools for flood forecasting. In Destructive
time using a neural network. Journal of water: water-caused natural disaster, their
Hydrology 137, 1–31. abatement and control (proceedings of the con-
Furundzic, D. 1998: Application of neural ference at Anaheim, CA, June). IAHS Publication
networks for time series analysis: 239, Wallingford: IAHS Press, 173–81.
rainfall–runoff modeling. Signal Processing 64,
383–96. Kang, K.W . Park, C.Y. and Kim, J.H. 1993:
Neural network and its application to
Gallant, S.I. 1993: Neural network learning and rainfall–runoff forecasting. Korean Journal of
expert system. London: MIT Press. Hydroscience, 4, 1–9.
Golob, R. Stokelj, T. and Grgic, D. 1998: Neural- Karunanithi, N. Grenney, W.J. Whitley, D. and
network-based water inflow forecasting. Bovee, K. 1994: Neural networks for river flow
Control Engineering Practice 6, 593–600. prediction. Journal of Computing in Civil
Gorr, W. Nagin, D. and Szcypula, J. 1992: The Engineering 8, 201–20.
relevance of artificial neural networks to managerial Kohohen, T. 1984: Self-organization and associative
forecasting; an analysis and empirical study. memory. New York: Springer-Verlag.
Technical Report 93-1. Pittsburgh, PA: Heinz –––– 1990: The self-organizing map. Proceedings of
School of Public Policy Management, Carnegie the IEEE 78, 1464–80.
Mellon University. Kwok, T.Y. and Yeung, D.Y. 1997: Constructive
Govindaraju, R.S. and Ramachandra Rao, A., algorithms for structure learning in

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 105

feedforward neural networks for regression Murray, South Australia. Ecological Modelling
problems. IEEE Transactions on Neural Networks 105, 257–72.
8, 630–45. Mason, J.C, Tem’me, A. and Price, R.K. 1996: A
neural network model of rainfall–runoff using
Lachtermacher, G. and Fuller, J.D. 1994: radial basis functions. Journal of Hydraulic
Backpropagation in hydrological time series Research 34, 537–48.
forecasting. In Hipel, K.W., McLeod, A.I., Panu, Masters, T. 1995: Neural, novel and hybrid
U.S. and Singh, V.P., editors, Stochastic and algorithms for time series prediction. New York:
statistical methods in hydrology and environmental Wiley.
engineering. Vol. 3, Dordrecht: Kluwer, 229–42. McCulloch, W.S. and Pitts, W. 1943: A logical
Lange, N.T. 1999: New mathematical approaches calculus of the ideas imminent in nervous
in hydrological modeling – an application of activity. Bulletin of Mathematical Biophysics 5,
artificial neural networks. Physics and Chemistry 115–33.
of the Earth 24, 31–35. Minns, A.W . 1996: Extended rainfall–runoff
Legates, D.R. and McCabe, G.J. 1999: Evaluating modelling using artificial neural networks’. In
the use of ‘goodness-of-fit’ measures in Muller, A., editor, Hydroinformatics 96:
hydrologic and hydroclimatic model proceedings of the 2nd International Conference on
validation. Water Resources Research 35, Hydroinformatics, Zurich, Vol. 1, 207–13.
233–41. Minns, A.W. and Hall, M.J. 1996: Artificial
Liong, S.Y. and Chan, W.T. 1993: Runoff volume neural networks as rainfall–runoff models.
estimates with neural networks. In Topping, Hydrological Sciences Journal 41, 399– 417.
B.H.V. and Khan, A.I., editors, Proceedings of the –––– 1997: Living with the ultimate black box:
3rd International Conference on the Application of more on artificial neural networks. In:
AI to Civil and Structural Engineering, Proceedings of the 6th British Hydrological Society
Edinburgh: Civil Computer Press, 67–70. Symposium, Salford University, 9.45–9.49.
Loke, E. Warnaars, E.A., Jacobsen, P. Nelen, F. Minsky, M.L. and Papert, S.A. 1969: Perceptrons.
and Almeida, M.D. 1997: Artificial neural Cambridge, MA: MIT Press.
networks as a tool in urban storm drainage. Murata, N. Yoshizawa, S. and Amari, S. 1994:
Water Science and Technology 36, 101–109. Network information criteria – determining the
Lorrai, M. and Sechi, G.M. 1995: Neural nets for number of hidden units for an artificial neural
modelling rainfall–runoff transformations. network model. IEEE Transactions on Neural
Water Resources Management 9, 299–313. Networks 5, 865–72.
Muttiah, R.S. Srinivasan, R. and Allen, P.M.
Magoulas, G.D., Vrahatis, M.N. and 1997: Prediction of two-year peak stream
Androulakis, G.S. 1997: Effective backpropa- discharges using neural networks. Journal of
gation training with variable stepsize. Neural the American Water Resources Association 33,
Networks 10, 69–82. 625–30.
Maier, H.R. and Dandy, G.C. 1996a: The use of
artificial neural networks for the prediction of Nash, J.E. and Sutcliffe, J.V. 1970: River flow
water quality parameters. Water Resources forecasting through conceptual models. Part 1.
Research 32, 1013–22. A discussion of principles. Journal of Hydrology
–––– 1996b: Neural network models for 10, 282–90.
forecasting multivariate time series. Neural Nowlan, S.J. and Hinton, G.E. 1992: Simplifying
Network World 6, 747–71. neural networks by soft weight-sharing. Neural
–––– 1998: The effect of internal parameters and Computation 4, 473–93.
geometry on the performance of back-
propagation neural networks: an empirical O’Loughlin, G. Huber, W. and Chocat, B. 1996:
study. Environmental Modelling and Software 13, Rainfall–runoff processes and modelling.
193–209. Journal of Hydraulic Research 34, 733–51.
–––– 2000: Neural networks for the prediction
and forecasting of water resources variables: a Pankiewicz, G.S. 1997: Neural network classifi-
review of modelling issues and applications. cation of convective air masses for a flood
Environmental Modelling and Software 15, forecasting system. International Journal of
101–23. Remote Sensing 18, 887–98.
Maier, H.R. Dandy, G.C. and Burch, M.D. 1998: Poff, N.L. Tokar, S. and Johnson, P. 1996: Stream
Use of artificial neural networks for modelling hydrological and ecological responses to
cyanobacteria Anabaena spp. in the River climate change assessed with an artificial

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
106 Hydrological modelling using artificial neural networks

neural network. Limnology and Oceanography 41, Shamseldin, A.Y. 1997: Application of a neural
857–63. network technique to rainfall–runoff model-
ling. Journal of Hydrology 199, 272–94.
Raman, H. and Sunilkumar, N. 1995: Smith, J. and Eli, N. 1995: Neural-network
Multivariate modelling of water resources time models of rainfall–runoff process. Journal of
series using artificial neural networks. Water Resources Planning and Management 121,
Hydrological Sciences 40, 145–63. 499–508.
Refenes, A. Burgess, A.N. and Bents, Y. 1997: Song, X.M. 1996: Radial basis function networks
Neural networks in financial engineering: a for empirical modeling of chemical process.
study in methodology. IEEE Transaction on MSc thesis, University of Helsinki
Neural Networks 8, 1223–67. (http://www.cs.Helsinki.FI/~xianming) (28
Rissanen, J. 1978: Modeling by short data January 1999).
description. Automation 14, 465–71. Stüber, M. and Gemmer, P. 1997: An approach for
Rumelhart, D.E. and McClelland, J.L., editors, data analysis and forecasting with neuro fuzzy
1986: Parallel distributed processing: explorations systems – demonstrated on flood events at
in the microstructures of cognition. Vol. 1. River Mosel. Lecture Notes in Computer Science,
Cambridge, MA: MIT Press. Computational Intelligence 1226, 468–77.
Sureerattanan, S. and Phien, H.N. 1997: Back-
Sajikumar, N. and Thandaveswara, B.S. 1999: A propagation networks for daily streamflow
non-linear rainfall-runoff model using artificial forecasting. Water Resources Journal December,
neural networks. Journal of Hydrology 216, 1–7.
32–55.
Schalkoff, R.J. 1997: Artificial neural networks. Tawfik, M. Ibrahim, A. and Fahmy, H. 1997:
New York: McGraw-Hill. Hysteresis sensitivity neural network for
Schwarz, G. 1978: Estimating the dimension of a modeling rating curves. Journal of Computing in
model. Annals of Statistics 6, 461–64. Civil Engineering 11, 206–11.
See, L. and Abrahart, R.J. 1999: Multi-model data Theil, H. 1966: Applied economic forecasting.
fusion for hydrological forecasting. In Amsterdam: North-Holland.
Proceedings of the 4th International Conference Thirumalaiah, K. and Deo, M.C. 1998: Real-time
on Geocomputation, Fredericksburg, Virginia, flood forecasting using neural networks.
USA, 25–28 July (http://www.ashville. Computer-Aided Civil and Infrastructure
demon.co.uk/geocomp). Engineering 13, 101–11.
See, L., Abrahart, R.J. and Openshaw, S. 1998: –––– 1998b: River stage forecasting using
An integrated neuro-fuzzy statistical approach artificial neural networks. Journal of Hydrologic
to hydrological modelling. In Proceedings of the Engineering 3, 26–32.
3rd International Conference on Geocomputation, Todini, E. 1988: Rainfall–runoff modelling – past,
University of Bristol, 17–19 September (http:// present and future. Journal of Hydrology 100,
www.geog.port.ac.uk/geocomp/geo98/22/gc 341–52.
_22.htm) (23 February 2000). Tokar, S.A. and Johnson, P.A. 1999:
See, L. Corne, S., Dougherty, M. and Openshaw, Rainfall–runoff modeling using artificial neural
S. 1997: Some initial experiments with neural networks. Journal of Hydrologic Engineering 4,
network models of flood forecasting on the 232–39.
River Ouse. In Proceedings of the 2nd
International Conference on Geocomputation, Watts, G. 1997: Hydrological modelling. In Wilby,
26–29 August, University of Otago, Dunedin, New R.L., editor, Contemporary hydrology: towards
Zealand, 59–67 (http://www/ashville.demon. holistic environmental science, Chichester: Wiley,
co.uk/geocomp). 151–93.
See, L. and Openshaw, S. 1998: Using soft
computing techniques to enhance flood Yang, R. 1997: Application of neural networks
forecasting on the River Ouse. In Babovic, V. and genetic algorithms to modelling flood
and Larsen, L.C., editors, Hydroinformatics 98: discharges and urban water quality. Unpub-
proceedings of the Third International Conference lished PhD thesis, University of Manchester.
on Hydroinformatics, 24–26 August, Copenhagen,
Denmark, 819–24. Zealand, C.M. Burn, D.H. and Simonovic, S.P.
–––– 1999: Applying soft computing approaches 1999: Short term streamflow forecasting using
to river level forecasting. Hydrological Sciences artificial neural networks. Journal of Hydrology
Journal 44, 763–78. 214, 32–48.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
C.W. Dawson and R.L. Wilby 107

Zhu, M. Fujita, M. and Hashimoto, N. 1994: U.S., and Singh, V.P., editors, Stochastic and
Application of neural networks to runoff statistical methods in hydrology and environmental
prediction. In Hipel, K.W., McLeod, A.I., Panu, engineering. Vol. 3, Dordrecht: Kluwer, 205–16.

Appendix: Neural network glossary

activation function the function embedded within a neuron that transforms the weighted sum
of its inputs into an output. These functions may be logistic sigmoid, linear, threshold,
Gaussian, hyperbolic tangent, etc.
architecture the structure of an ANN – including the number and connectivity of neurons.
Usually an ANN is arranged into several layers of neurons – an input layer, one or more
hidden layers and an output layer.
backpropagation the training algorithm for the feed-forward, multilayer perceptron which
works by propagating errors back through a network and adjusting weights and biases to
reduce this error accordingly.
basis function a model function can be represented by a linear combination of several basis
functions. The term ‘basis function’ stems from basis vectors that combine to form a vector.
bias an additional weighted input to a neuron that stems from an imaginary unit that always
has a value of one. During calibration this bias input is adjusted in the same way as other
weights by the training algorithm.
constructive algorithm a training algorithm that successively adds neurons to the hidden layer
of an ANN in order to determine an optimum network geometry.
delta rule a term often used for backpropagation. The delta rule refers to the adjustments made
to weights and biases in an ANN during training.
epoch a single pass through the calibration data set during training.
error surface/function see weight space (below).
feed forward a network in which all the connections between neurons flow in one direction –
from an input layer, through one or more hidden layers, to an output layer.
geometry see architecture (above).
layer the arrangement of neurons, being hidden, input or output.
learning parameter used in the backpropagation algorithm to adjust the rate of changes to a
network’s weights during training.
MLP Multilayer perceptron. The popular feed-forward (three or four layer) ANN comprising
sigmoid or hyperbolic tangent neuron functions, trained using standard backpropagation.
momentum a factor used in the backpropagation training algorithm to speed convergence to
an error minimum.
neuron the basic building block of a neural network. A neuron sums the weighted inputs from
the ‘outside world’ or other neurons, then processes these inputs using an activation function,
and produces an output response.
node see neuron (above).
normalization rescaling data to a standard normal distribution. This is sometimes referred to
as standardization which involves rescaling data linearly to a particular range such as [0,1], [–1,
+1], etc.
pruning algorithm a training algorithm that progressively reduces the number of neurons in
an ANN’s hidden layer in an attempt to find an optimum network geometry.
RBF radial basis function. The ANN with the same structure as an MLP but the neurons are
represented by radial basis functions.
recurrent an ANN in which connections between neurons feed backwards through the
network as well as forwards.
standardization see normalization (above).

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.
108 Hydrological modelling using artificial neural networks

training/learning the process of adjusting a network’s weights and biases so that the ANN
model error is progressively reduced.
transfer function see activation function (above).
weight a multiplicative value applied to a neuron’s inputs (each input having a different
weight). It is the weights in a network that are adjusted to ‘train’ it.
weight space the n-dimensional surface in which weights in a network are adjusted by the
backpropagation algorithm to minimize model error.

Downloaded from http://ppg.sagepub.com at PENNSYLVANIA STATE UNIV on February 5, 2008


© 2001 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution.

View publication stats

Вам также может понравиться