Академический Документы
Профессиональный Документы
Культура Документы
5, SEPTEMBER 1997
1177
Abstract This paper deals with optimization of the computations involved in training radial basis function (RBF) neural
networks. The main contribution of the reported work is the
method for network weights calculation, in which the key idea is
to transform the RBF kernels into an orthonormal set of functions
(by using the standard GramSchmidt orthogonalization). This
significantly reduces computing time (in comparison to other
methods for weights calculations, e.g., the direct matrix inversion
method) if the RBF training scheme, which relies on adding one
kernel hidden node at a time to improve network performance,
is adopted. Another property of the method is that, after the
RBF network weights are computed (via the orthonormalization
procedure), the original network structure can be restored back
(i.e., to the one containing RBFs in the hidden layer). An
additional strength of the method is the possibility to decompose
the proposed computing task into a number of parallel subtasks
so gaining further savings on computing time. Also, the proposed weight calculation technique has low storage requirements.
These features make the method very attractive for hardware
implementation. The paper presents a detailed derivation of the
proposed network weights calculation procedure and demonstrates its validity for RBF network training on a number of
data classification and function approximation problems.
Index Terms Approximation, function orthonormalization,
RBF neural networks.
I. INTRODUCTION
ADIAL basis function (RBF) networks, which have recently attracted an extensive research interest [1][5], can
be regarded as an amalgamation of a data modeling technique
in a high-dimensional space called RBF [6] and an universal
approximation scheme popularly known as artificial neural
networks (ANNs) [1]. Effectively, a novel and powerful data
processing tool has been proposed [7] in which the exact
interpolation regime required in the original RBF method has
been relaxed and put into a layered network connectionist
structure.
An RBF network approximates an unknown mapping function
, by using a two-layer data processing
structure. First, the input data undergoes a nonlinear transformation via the basis functions in the network hidden layer,
then basis functions responses are linearly combined to give
the network output. Hence, the overall inputoutput transfer
Manuscript received March 19, 1996; revised October 10, 1996 and
April 2, 1997. This work has been carried out as a part of the Research
Project 0399/P4/94/07 sponsored by the Polish State Committee for Scientific
Research.
W. Kaminski is with the Faculty of Process and Environmental Engineering,
Technical University of Lodz, 90-924 Lodz, Poland.
P. Strumillo is with the Electrical and Electronics Engineering Faculty,
Technical University of Lodz, 90-924 Lodz, Poland.
Publisher Item Identifier S 1045-9227(97)05246-6.
1178
AND
TRAINING
1179
(9)
(11)
where
(6)
(12)
in which the substitution
is introduced
to simplify the notation.
If the locations of the basis function centers, i.e., vectors
of coordinates
are distinct then the set of equations
(6) is linearly independent and has a unique solution [2].
Additionally, this is a single globally optimum solution to (5).
For the sake of clearer explanation of the key points of the
proposed method for solution of (6), the following notations
are introduced for the space of multivariable functions
.
The inner product of two functions is defined as
(7)
1180
TABLE I
IRIS DATA CLASSIFICATION RESULTS OBTAINED FOR 15 RBF KERNEL NODES
WITH THE USE OF THE PROPOSED KERNEL ORTHOGONALIZATION SCHEME
but with the connection weights computed from (18); see also
Fig. 1.
The presented orthonormalization method has a simple
extension to a multivariable network output. In such a case,
calculations of the weights
need to be repeated in an
identical manner as shown in the sequence of (6)(18) for
each element of the output vector
. Obviously, in general,
the computed weight values
for each of the network output
linear combiner node will be different. Also, see the Appendix
where the recursive calculation procedure for
, which is
suggested, eliminates the need for repetitive computation of
functions inner products in the set of equations (11).
IV. RESULTS
(14)
Graphical interpretation of the described RBF kernel orthonormalization procedure is illustrated in Fig. 2. Now, after the
expression for
given in (12) is substituted into (14) one
gets
(15)
under the second summation sign the
and by moving
network transfer function may be expressed as
(16)
Then, by grouping appropriate terms in the last expression
one gets
(17)
and by comparing (1) and (17) the following formula is
obtained:
(18)
which simply implies that the overall transfer function of
the RBF network after the orthonormalization procedure is
performed, i.e., (17), can be expressed in an identical form as
given in (1). Accordingly, the resultant network structure will
be identical to the initial structure of the used RBF network
AND
SUMMARY
1181
TABLE II
COMPARISON OF COMPUTATIONAL COSTS FOR CALCULATING OUTPUT
NETWORK WEIGHTS FOR THE PROPOSED ORTHONORMALIZATION
SCHEME AND THE GAUSSSEIDEL ELIMINATION METHOD
(a)
(b)
Fig. 3. (a) Two spiral shaped classes of points used for learning an RBF
network. (b) Separation of the spirals obtained for 64 RBF kernel functions
trained with the orthonormalization procedure proposed in the paper; there
are 41
41 testing points of the classified area.
1182
TABLE III
ONE-STEP AHEAD PREDICTION OF THE CHAOTIC TIME SERIESQUANTITATIVE
COMPARISON FOR DIFFERENT NUMBER OF NETWORK HIDDEN NODES
Fig. 4. The quadratic map (solid line) and its approximation by an RBF
network with five Gaussian kernels (centers of the open circles).
(19)
(20)
D. Summary
An efficient noniterative technique for computations of the
RBF network weights has been proposed in which RBF kernels
are transformed into an orthonormal set of functions. This
has allowed the requirement, to compute the off-diagonal
terms in the linear set of equations for computations of the
optimum network weight set to be eliminated. Consequently,
incorporation of new network hidden nodes aimed at improving network performance does not require recomputation
of the network weights already calculated. This property of
the proposed method allows for a very efficient network
training procedure where network hidden nodes are added one
at a time until an adequate error goal is reached (see the
Iris classification benchmark). Additionally, the method has
low storage requirements since the weights can be computed
(21)
where
(22)
1183
and
(23)
(24)
Equations (22) and (23) now allow for the following recursive calculation of coefficients
. First, it is noted that
is
equal to the inverse value of the norm defined in (8). Then, the
coefficient
is calculated from (22) and next the coefficient
is obtained from (23). Further calculations proceed in a
similar manner until the values of coefficients
and
are finally calculated. Once all values for
the coefficients
are known, the weight values
can be
calculated by substituting into (13) the expressions for
given in (12). This finally yields the following:
(25)
Recall that the primed weights are different from those used
in the original expression for RBF network transfer function
(1); compare also Figs. 1 and 2.
REFERENCES
[1] R. Hush and B. G. Horne, Progress in supervised neural networks:
Whats new since Lippmann? IEEE Signal Processing Mag., vol. 10,
pp. 839, Jan. 1993.
[2] M. Bianchini, P. Frasconi, and M. Gori, Learning without local minima
in radial basis function networks, IEEE Trans. Neural Networks, vol.
6, pp. 749755, May 1995.
[3] T. Chen and H. Chen, Approximation capability to functions of several
variables, nonlinear functionals, and operators by radial basis function
neural networks, IEEE Trans. Neural Networks, vol. 6, pp. 904910,
July 1995.
[4] D. Gorinievsky, On the persistency of excitation in radial basis function network identification of nonlinear systems, IEEE Trans. Neural
Networks, vol. 6, pp. 12371244, Sept. 1995.
[5] E. S. Chng, S. Chen, and M. Mulgrew, Gradient radial basis function
networks for nonstationary nonlinear time series prediction, IEEE
Trans. Neural Networks, vol. 7, pp. 190194, Jan. 1996.
[6] M. J. D. Powell, Radial basis functions for multivariable interpolation:
A review, in Algorithms for Approximation of Functions and Data, J.
C. Mason, M. G. Cox, Eds. Oxford, U.K.: Oxford Univ. Press, 1987,
pp. 143167.
[7] D. S. Broomhead and D. Lowe, Multivariable functional interpolation
and adaptive networks, Complex Syst., vol. 2, pp. 321355, 1988.
[8] T. Poggio and F. Girosi, Networks for approximation and learning,
Proc. IEEE, vol. 78, no. 9, pp. 14811497, Sept. 1990.
[9] H. Demuth and M. Beale, Neural Network Toolbox Users Guide. The
Mathworks Inc., 1994.
[10] J. E. Moody and C. J. Darken, Fast learning in networks of locallytuned processing units, Neural Computa., vol. 1, pp. 281294, 1989.
[11] A. Ralston, Mathematical Methods for Digital Computers. New York:
Wiley, 1960.