Академический Документы
Профессиональный Документы
Культура Документы
4000
I
3000 l
i
I
2000
~
1000
0
1
o 50 100 150 200 250 300 350
Figure 9: Trace of the covariance matrix for the GBP algori thm.
Figure 10: Architect ure to solve the symmetry probl em.
68
P.J. Gawthrop and D. Sbsrbeto
J
I
I
I
I
.;
,
1
I
i
i
i
j
2000 2500 3000
L-,
\
:
\
500 1000 1500
I
OLI- - - ~ - - _ _ , _ _ : _ : _ - - " - ~ - - - ~ - - _ _ , _ ~ - - ~
o
3, ,------- - - - ---- - - - - - - - - - - -.....,
Figure 11: Square error for the BP algorithm.
Comment s:
1. This analysis is based in two assumptions: first t hat 0
0
is close to eo,
t hat is, the analysis is only local, and second t hat an architecture of
t he NN that can approximate the behavior of the t raining set. Even
though these assumptions seem quite restrictive, it is possible t o have
some insight over the convergence problem.
2. The conditions (1) and (2) are the most important from the solution
point of view, because if they are not met it is possible to have a local
rrummum,
3. The condit ion (2) means that the derivatives must be different from 0,
that is, all the outputs of the units cannot be 1 or 0 at the same
t ime. That means if the input is bounded will be necessary bound the
parameters; for example, one way to accompl ish that is to change the
int erpretation of the output [2].
4. If S is not invert ible, the criterion will not have a unique solution. It will
have a "valley" and certain linear combination of t he par ameters will
not converge or converge an order of magnitude more slowly [17]. There
are many causes for the nonexistence of S inverse, such as the model
set cont ains "too many parameters" or when experimental conditions
The Gain Backpropagation Algorithm
69
' .S.- - - ---------------------
'J.4 H
II
03Si\
0.3
o2sl\
\
O.ISr
,
'1.1 r
(I.OSt,
no 100 80 60 40 20
o
Figure 12: Square error for the GBP algorithm.
are such t hat individual parameters do not affect t he predict ions [17] .
These effects ari se during t he learning process, because only some units
are act ive in some regions cont ribut ing to t he output .
5. The condi t ion (4b) arise when t he st ruct ure neural network's st ructure
can mat ch t he st ruct ure of the system.
6. Condition (3) shows that the order of pr esent at ion can be important
to get convergence of the algorithm.
6. Conclusi ons
The stochastic approximation algorithm presented in this paper can success-
fully adjust t he weights of a multilayer perceptron using many less presenta-
tions of t he training data t han required for the BP method. However, there
is an increased complexity in the calculation.
A formal proof of the convergence properties of the algorithm in this con-
text has not been given; instead, some insights into the convergence problem,
together with links back to appropriate control and systems theory, are given.
Moreover, our experience indicates that the GPA algorithm is better than
the BP method at performing continuous transformations. This difference is
produced because in the continuous mapping the transformation is smooth
70 P.J. Gawthrop and D. Sbarbaro
---- - - ----- - - - - - - - ----
I
i
4000L
. \1 I
VI
IV!
l
1"0
100 80 60 40
"0
0- - - - - - - - - - - ---- - - - -----
o
Figure 13: Trace of the covariance matrix for the GBP algorithm.
Figure 14: Cartesian manipulator.
The Gain Backpropagation Algorithm 71
Figure 15: Architecture to solve the coordinate transformation prob-
lem.
2.5
2
1.5
1000 1500 2000 2500 3000 3500 4000 4500 500
0.5
Figure 16: Squar e error for the BP algorithm.
72
P.J. Gawthrop and D. Sberbsro
3 . 5 r - - - ~ - - - ~ - - - - ~ - - - ~ - - - ~ - - - - - .
2.5
2
1.5
0.5
\
20 40 60 80 100 120
Figure 17: Square error for the GBP algorithm.
Figure 18: Initial surface.
The Gain Backpropagation Algorithm
Figure 19: Final and reference surface.
73
and it is possible to have a great number of patterns (theoretically infinite)
and only a small number are used to train the NN, so it is possible t o choose
an adequate training set for a smooth representation, giving a better behavior
in terms of convergence.
A different situation arises in the discrete case, where it is possible to
have some input patterns that do not differ very much from each other and
yet produce a very different system output. In terms of our interpretations,
this means a rapidly varying hypersurface and therefore a very hard surface
to match with few components.
Further work is in progress to establish more clearly under precisely what
conditions it is possible to get convergence to the required global solution.
References
[1] A.E. Albert and L.A. Gardner, Stochastic Approximation and Nonlinear
Regression (MIT Press, Cambridge, MA, 1967).
[2] J.L. McClelland and D.E. Rumelhart, Explorations in Parallel Distributed
Processing (MIT Press, Cambridge, MA 1988).
[3] T.J. Sejnowski, "Neural network learning algorithms," in Neural Computers
(Springer-Verlag, 1988) 291-300 .
[4] R. Hecht-Nilsen, "Neurocomputer applications," in Neural Computers
(Springer-Verlag, 1988) 445-453.
74 P.J. Gawthrop and D. Sbarbaro
[5] R. Meir, T. Grossman, and E. Domany, "Learning by choice of internal
representations," Complex Systems, 2 (1988) 555-575 .
[6] H. Bourland and Y. Kamp, "Auto-association by multilayer perceptrons and
singul ar value decomp osition," Biological Cybernetics, 59 (1988) 291-294.
[7] A.E. Albert, Regression and the Moore-Penrose Pseudoinverse (Academic
Press, New York, 1972).
[8] L.B. Widrow and R. Winter, "Neural net s for adaptive filtering and adaptive
pattern recognition," IEEE Computer, 21 (1988) 25-39 .
[9] D.S. Broomhead and D. Lowe, "Multivariable functional int erpol ation and
adaptive networks ," Compl ex Systems, 2 (1988) 321-355.
[10] G.J. Mit chison and R.M. Durbin, "Bounds on the learning capacity of some
multilayer networks," Biological Cybernetics, 60 (1989) 345-356.
[11] N.J . Nilsson, Learning Machines (McGraw-Hill, New York, 1974).
[12] M.L. Minsky and S.A. Pap ert, Perceptrons (MIT Press, Cambridge, MA,
1988).
[13] G. Josin, "Neural-space generalization of a topological transformation," Bi-
ological Cybernetics, 59 (1988) 283-290.
[14] W.F. Trench and B. Kolman, Multivariable Calculus with Linear Algebra
and Series (Academic Press, New York, 1974).
[15] C. Moler , J. Little, and S. Bangert, MATLAB User's Guide (Mathworks
Inc., 1987).
[16] L. Ljung, Systems Iden tification - Theory for the User (Prenti ce Hall, En-
glewood Cliffs, NJ , 1987).
[17] L. Ljung and T. Soders trom, Parameter Identification (MIT Press, London,
1983).