Вы находитесь на странице: 1из 6

Stable Neural Control of a Flexible-Joint

Manipulator Subjected to Sinusoidal Disturbance


C.J.B. Macnab
Dept. of Electrical and Computer Engineering, University of Calgary, Calgary, Alberta, Canada,
Email: cmacnab@ucalgary.ca
AbstractThe proposed method aims at halting weight drift
when using multilayer perception backpropagation networks in
direct adaptive control schemes, without sacricing performance
or requiring unrealistic large control gains. Unchecked weight
drift can lead to a chattering control signal and cause bursting.
Previously proposed robust weight update methods, including
e-modication and dead-zone, will sacrice signicant perfor-
mance if large control gains cannot be applied. In this work, a set
of alternate weights guides the training in order to prevent drift.
Experiments with a two-link exible-joint robot demonstrate the
improvement in performance compared to e-modication and
dead-zone.
1.. INTRODUCTION
Neural-adaptive control systems utilize neural networks that
adapt online in an unsupervised learning strategy, eliminating
the need for pretraining. Using a Lyapunov-stable direct-
adaptive control framework to derive neural-network weight
update laws can produce stable neural-network robot controls
[1]. However, if a persistent disturbance prevents the error
from going to zero, the weights tend to drift upwards in
magnitude. The weight drift effect is well known in static
neural network learning, where it results in overtraining.
Weight drift will eventually cause a chattering control, which
may excite the dynamics and cause a sudden growth in
error (bursting) [2]. Several standard adaptive Lyapunov-
stable control designs guarantee bounded signals, which have
also been applied to neural adaptive control ,including dead-
zone [3], leakage [4], and e-modication [1]). However, these
methods require very large feedback gains to guarantee small
errors. To make the system robust to a signicant persistent
disturbance while using realistic gains, one must sacrice
signicant performance (e.g. increasing the size of the dead-
zone or increasing the e-modication gain).
This paper further develops and experimentally veries
a novel technique for halting weight drift in a multilayer
perceptron backpropagation network [5]. This method does
not require a signicant sacrice of performance to achieve ro-
bustness to a sinusoidal disturbance near the natural frequency
of the arm. Experiments with a commercially available two-
link exible-joint arm show the improvement in performance
over other methods.
q
1
q
2
o = w(

H
T
q) = w(

Vq)

h
11

h
12

h
21

h
22

h
31

h
32

h
T
1
q

h
T
2
q

h
T
3
q
w
1
w
2
w
3
(

h
1
q) (

h
2
q) (

h
3
q)
Fig. 1. A linear-output, two-layer MLP
2.. BACKGROUND
A two-layer linear-output multilayer perceptron (Figure1),
with m hidden units and p inputs, provides an output
o = w
T
(

H
T
q) = w
T
(

Vq) (1)
where w R
m
contains output weights,

H
T
R
mp
contains hidden weights, q R
p
contains inputs, and
(

H
T
q) R
m
contains outputs from the hidden layer. The
matrix

H
T
=

V contains row vectors and column vectors as
follows:

H
T
=

h
11
. . .

h
1p
.
.
.
.
.
.
.
.
.

h
m1
. . .

h
mp

h
T
1
.
.
.

h
T
m

=
_
v
1
. . . v
p

=

V
(2)
Direct neural-adaptive control works for systems of form
x
1
= x
2
(3)
M x
2
= g(x
1
, x
2
) + u (4)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
978-1-4244-2713-0/09/$25.00 2009 IEEE 698
w
1
w
2
w
2 p
1
p
2
p
2
w
T

p
T

Fig. 2. Alternate weights for the output layer


where g(x) contains linear and nonlinear functions and
M is a positive constant. Given a desired trajectory
x
1,d
(t), x
2,d
(t), x
2,d
(t) and dening the augmented state error
as z = (x
1
x
1,d
)+(x
2
x
2,d
), with constant > 0, results
in error dynamics
M z = f(q, t) + u (5)
where q =
_
x
1
x
2
x
1,d
x
2,d
x
2,d

T
. The unknown
weights that would ideally model f(q, t) are denoted w and
H:
f(q, t) = o(w, H, q) + d(q, t) (6)
where d(q, t) is a uniform bounded modeling error. The
weight errors become w = w w and

H = H

H. Consider
the Lyapunov-like function and its derivative
V =
1
2
Mz
2
+
1
2
_
w
T
w + tr(

V
T

V)

(7)

V = z(f(q, t) + u)
1

_
w
T

w + tr(

V
T

V)
_
(8)
where tr() denotes the trace of a matrix. The neural network
contributes a portion of the control signal, along with state
feedback and a nonlinear robust term, in
u = o Gz z|z| (9)
where G is a positive feedback gain and is a positive
constant. Using e-modication to achieve robustness to the
modeling error (and any bounded external disturbance) results
in output weight updates as in [1]

w = [z(

Vq) |z| w], (10)


where > 0 is the adaptation gain and > 0 is the e-
modication gain. Two equivalent expressions for the hidden
weight updates are

h
k
=
_
zq( w
T

)
k
|z|

h
k
_
, k = 1 . . . m, (11)

v
j
=
_
zq
j
( w
T

)
T
|z|v
j

, j = 1 . . . p, (12)
where each q
k
is an element of vector q, ( w
T

)
j
is an
element of ( w
T

), and

contains the derivatives of with


q
1
q
2

h
j1

h
j2 r
j1
r
j2
r
T
j
q

h
T
j
q
Fig. 3. Alternate weights for the hidden layer (the jth neuron illustrated)
respect to

H
T
q. The control 9 and weights updates (10)(12)
result in a guarantee of semi-globally uniformly ultimately
bounded (SGUUB) signals, established in Appendix A.
Note that another popular method of robustifying the
weight update is dead-zone where the weight updates are not
applied when z < with > d
max
/G
min
where d
max
is a
bound on disturbances and G
min
is the minimum eigenvalue
of the gain.
3.. PROPOSED METHOD
In the proposed method alternate weights p help supervise
the training of the output weights. The hidden layer alternate
weights are

R
T
=

r
11
. . . r
1p
.
.
.
.
.
.
.
.
.
r
m1
. . . r
mp

r
T
1
.
.
.
r
T
m

=
_
s
1
. . . s
p

=

S
(13)
The idea is that the alternate weights try to approximate the
outputs of the control weights w and

H, on a per-layer basis
(Figures 2 and 3). The design of the training rule ensures the
alternate weights do not undergo the same weight drift as the
control weights.
A. Alternate Weights - Supervised learning
The output alternate weights undergo training

p =|z|
_
a( w
T
p
T
) C p

, (14)
where a is a positive learning gain and C is positive-denite
(p.d.) diagonal leakage gain. In words, the alternate output
p
T
trains to approximate control output w
T
. The leakage
term C p prevents the weights from drifting to innity. A
proper design of a and C will sacrice a little approximation
accuracy to keep the weights relatively small in magnitude
(preventing drift). For the alternate hidden weights

s
k
=|z|
_
b
k
q
k
(

Vq

Sq) D
k
s
k
_
, k = 1 . . . p (15)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
699
where s
j
R
m
are the column vectors of

S
T
, b
j
is positive,
and D
j
is p.d. diagonal. In words, the alternate weights on
the hidden layer produce outputs

S
T
q which will approximate

Vq and the term D


j
s
j
provides leakage.
Although the learning rules (14) and (15) could produce a
set of alternate weights in an off-line training stage, the next
section proposes a method to accomplish this training on-line.
Then terms a,b
j
, C, and D
j
all become appropriately designed
variables.
B. Online Training
Utilizing the alternate weights, the adaptation law (10)
changes to

w =

z(

Vq) + a|z|( p
T
w
T
) +|z|C( p w)

(16)
The update (16) prevents bursting as long as
1) p < w
2) each positive (diagonal) term in C is large enough
The adaptation law for the the hidden weights is analogous:

vj = [( w
T

)
T
zqj + bjqj|z|(

Sq

Vq) +|z|Dj(sj vj)]
for j = 1 . . . m and the requirements to prevent bursting are
analogous.
To meet the rst requirement, the alternate weights are
initialized to be smaller in magnitude and are kept smaller
by choosing
a( w, p) =
_
1
m
m

k=1
| w
k
p
k
|
_
(17)
b
j
(

V
j
,

S
j
) =
_
1
m
m

k=1
|

h
kj
r
kj
|
_
(18)
where is a positive constant. In words, the adaptation rate
is proportional to the average difference between the alternate
and control weight magnitudes.
To meet the second requirement, measurements of the
weight drift indicate how large to make C and each D
j
, called
weight drift indicators:
y
w
=
_
| w
1
p
1
| . . . | w
m
p
m
|

T
1
m

m
k=1
| w
k
p
k
|
(19)
y
Vj
=
_
|

H
1j


S
1j
| . . . |

H
mj


S
mj
|

T
1
m

m
k=1
|

H
kj


S
kj
|
(20)
That is, the relative (magnitude) of difference between the
control and alternate weights measures the drift of a particular
control weight. Designing C and each D
j
to utilize the weight
drift indicators:
C( w, p) = diag ( exp( y
w
) + ) (21)
D
j
(

V
j
,

S
j
) = diag ( exp( y
Vj
) + ) (22)
Fig. 4. Experimental two-link exible-joint robot arm
where and are positive constants and is any positive
constant. Measuring the values of the weight drift indicators
(19) and (20) at the point a non-robust experiment goes unsta-
ble allows the quantitative design of appropriate exponential
curves (21) and (22) (ensuring that C and D
j
are very small
when the weight drift indicators are not near their critical
values, but become large otherwise.)
C. Dead-zone
The dead-zone is an area near zero error z < where
the control weights freeze. Unlike traditional dead-zone which
requires knowledge of the maximum disturbance bound, this
dead-zone is simply a very small region near the origin where
the method can typically bring the system.
4.. EXPERIMENTAL APPARATUS
Trajectory tracking of a two-link exible-joint robot exper-
iment (Fig. 4) serves to validate the approach. During the
experiments, a 1 Kg payload sits at the end of the second
link. Both natural frequencies of the robot are approximately
1 Hz with the payload.
In the experiment the tip of the second link traces a 4
cm square trajectory in 16 seconds, while being subjected
to disturbance
d(t) = 0.32 sin(2t) (23)
The disturbance has just enough amplitude to make the second
exible joint have oscillations visible to the naked eye, and
is very near the natural frequencies.
Making the system equivalent to (5), a backstepping pro-
cedure produces appropriate virtual controls and controls as
in [6] resulting in
I(q) z = Gz +F(q)

W
T
(

V
T
q), (24)
where q contains the robot states, z contains the output errors
and virtual control errors, F contains linear and nonlinear
terms, and G is a positive-denite, constant, symmetric matrix.
One single-output MLP is used for each (virtual) control
signal, each with ve hidden units. This ensures the training
can take place quickly, within 100 repetitive trials (although it
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
700
5 10 15 20 25 30 35 40 45 50
0.2
0.4
0.6
0.8
1
1.2
1.4
5 10 15 20 25 30 35 40 45 50
0
0.01
0.02
0.03
0.04
Trials
R
M
S
e
r
r
o
r
(
d
e
g
.
)
M
a
x
.
w
e
i
g
h
t
m
a
g
n
i
t
u
d
e
Fig. 5. Training of MLP without any robust modication
may not be able to learn additional trajectories). The hidden
units contain typical sigmoidal functions.
A. Results
All experiments use a common adaptation gain = 0.5
and control gains G
i
= diag(2, 2) for i = 1, 2, 3. When
the weight updates do not have any robust modication, the
RMS error briey converges to 0.2 degrees before the weight
drift causes bursting (and apparent instability) on the 50th
repetitive trial (Figure 5). When using e-modication weight
updates (10),(11) three different values of fail to provide
satisfactory performance (Figure 6). Only a value of = 1.6
(or greater) is able to stop the weight drift, but the resulting
error is ve times worse the optimum performance.
Other experiments allowed identication of the critical
value of the weight drift indicators (when bursting occurs)
as y
w
= y
Vj
= 3, leading to a choice of = 0.0015
and = 3 according the design method in Section 3.-B.
Using parameter values of = 0 and = 0.1 produced
satisfactory alternate outputs. The only parameter than needs
to be identied through further experiment is the size of the
dead-zone . Values for the dead-zone of greater than 1 degree
were all sufcient to halt the weight drift. The dead-zone of 1
degree results in the best performance, very near 0.4 degrees
RMS errors (Fig. 7). Note that a traditional dead-zone design
requires > d
max
/G
min
= 0.32/2 or 9 degrees. Thus, the
new method performs about nine times better than traditional
dead-zone, and from the experiments we see the proposed
method performs four times better than e-modication.
5.. CONCLUSIONS
A method that uses an alternate set of weights to guide
the training of a multilayer perceptron neural network can
achieve stable control of a exible-joint robot in the presence
10 20 30 40 50 60 70 80 90 100
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90 100
0
0.005
0.01
0.015
0.02
0.025
0.03


Trials
R
M
S
e
r
r
o
r
(
d
e
g
.
)
M
a
x
.
w
e
i
g
h
t
m
a
g
n
i
t
u
d
e
= 0.2
= 0.9
= 1.6
Fig. 6. Training of MLP using e-modication
10 20 30 40 50 60 70 80 90
0.2
0.4
0.6
0.8
1
1.2
1.4
10 20 30 40 50 60 70 80 90
0
0.005
0.01
0.015
0.02
0.025
0.03


Trials
R
M
S
e
r
r
o
r
(
d
e
g
.
)
M
a
x
.
w
e
i
g
h
t
m
a
g
n
i
t
u
d
e
1 deg
2 deg
5 deg
Fig. 7. Training of MLP using proposed method - varying learning dead-
zone
of a signicant sinusoidal disturbance. In this situation, a
weight update with no robust modication rst converges to
an optimum level of performance before going unstable (due
to unchecked weight drift). The traditional robust methods of
e-modication and dead-zone fail to produce a practical result
in this situation, sacricing so much performance that no
signicant adaptation occurs. The proposed method, however,
can still adapt and reduce the RMS error over a number of
repetitive trials, coming within 0.2 degrees of the optimum
performance while completely stopping the weight drift.
APPENDIX A - STABLE BACKPROPAGATION
The solution to stable backpropagation using e-modication
that follows was introduced in [1]. A neural network can
uniformly approximate a nonlinear function f(q) in a local
region D R
p
if there exists a set of weights w and V such
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
701
that:
f(q) = w
T
(Vq) + d(t, q) (25)
with |d(t, q)| < d
max
q D. Dene weight errors w
T
=
w
T
w
T
, hidden weight errors

V = V

V and equation (25)


becomes:
f(q) = ( w
T
+ w
T
)[ +(

Vq)] + d (26)
where = (Vq) (

Vq). Dening = (

Vq). Then as
in [1] use a Taylor series expansion of = about :
= (

Vq) +

(Vq

Vq) +O()
2

=

Vq +O()
2
(27)
where O()
2
represents higher order terms and:

=

(Vq)

V=

V
. (28)
Assume
max
and

with
max
and
positive constants. Bound the norm of the higher order terms,
assuming x
d
with a positive constant, as follows:
O()
2
+

Vq 2
max
+

Vq
2
max
+

V +

Vz (29)
where the matrix norm for

V is the Frobenius norm. Rewrite


the nonlinear function approximation (26) as:
f(q) = w
T
(

Vq) + w
T
(

Vq + ) + (30)
where:
= w

Vq + w
T
O()
2
+ d, (31)
wVq +wO()
2
+ d
max
(32)
wV + wVz +wO()
2
+ d
max
Combined with (29) the result is:
A
1
+ A
2

W + A
3
z

W, (33)
where A
1
, A
2
and A
3
are positive constants and

W =
diag( w,

V). Equation (24) requires n neural networks with
corresponding weights given by w
i
and

V
i
for i = 1 . . . n.
Consider the (adaptive control) Lyapunov-like function:
V =
1
2
z
T
Iz +
1
2
n

i=1
_
w
T
i
w
i
+ tr(

V
T
i

V
i
)

(34)
where tr() denotes the trace of a matrix. Then:

V =
d
dt
_
1
2
z
T
Iz
_

i=1
_
w
T
i

w
i
+ tr(

V
T
i

V
i
)
_
(35)
Evaluate the rst term:
d
dt
_
1
2
z
T
Iz
_
= z
T
_
I z +
1
2

Iz
_
(36)
= z
T
(Zz Gz +F c +
1
2

Iz +) (37)
and assume the ith neural network, for i = 1 . . . n, can model
nonlinearities:
F
i
+
1
2
_

Iz
_
i
= c
i
+ d
i
= w
T
i
(V
i
q) + d
i
(38)
Using the fact that z
T
Zz = 0 write

V =

n
i=1

V
i
and
evaluate

V
i
by combining (35),(37), and (38) and expanding
the vector z = [z
1
, z
2
, . . . z
n
]
T
:

Vi = zi[w
T
i
(Viq)+di ci Gizi i]
1

w
T
i

wi tr(

V
T
i

Vi)

(39)
and using the result from (30):

V
i
=z
i
[ w
T
i
(
i

V
i
q) + w
T
i

V
i
q + w
T
i

i
+
i
c
i
G
i
z
i

i
]
1

_
w
T
i

w
i
+ tr(

V
T
i

V
i
)
_
(40)
Using the facts c
i
= w
T
i

i
and
tr(

V
T
i

V
i
) =
p

j=1
v
T
i,j

v
i,j
and w
T
i

V
i
q =
p

j=1
v
T
i,j
( w
T
i

i
)
T
q
j
results in

Vi = zi(i Gizi i) + w
T
i
"
zi( i i

Viq)

wi

#
+
p
X
j=1
v
T
i,j
"
( w
T
i

i
)
T
ziqj

vi,j

#
(41)
where q
j
is the jth element of q. The weight updates, using
e-modication, are:

w
i
= [z
i
(
i

V
i
q) z w
i
] (42)

v
i,j
= [( w
T
i

i
)
T
z
i
q
j
zv
i,j
] (43)
where is a positive constant which needs to be chosen large
enough to prevent weight drift. This is a (stable) form of
backpropagation. The resulting Lyapunov derivative is:

V = z
T
Gz + z
T
+ z
T
+ z
n

i=1
_
w
T
i
w
i
+ tr(

V
T
i

V
i
)
_
(44)
Choosing a form of nonlinear damping for the robust term:
= zz with constant > 0 (45)
results in a bound for the Lyapunov derivative:

V z
_
gz z
2
+
n

i=1
_

i
+ tr(

W
T
i

W
i
)
_
_
(46)
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
702
where g is the minimum eigenvalue of G and

W =
diag( w,

V). Using

W
i
= W
i


W
i
and the bound from (33)
results in:

V z

T

A3/2
A3/2

z

g
A2 + W

T

z

+A1

(47)
where each A
k
= [A
k,1
. . . A
k,n
] and

W =
diag(

W
1
. . .

W
n
).
Setting (47) equal to zero describes the boundary of a
compact set B on the (z,

W) plane. Outside of this
compact set

V < 0 if the matrix in the elliptic term is negative
denite, which means the parameters must be chosen such that
> A
2
3
/4.
Note that knowledge of the maximum bound on W (the
ideal weights) is required to calculate A
3
. By standard Lya-
punov arguments, the smallest Lyapunov surface enclosing B
is then a bound on the signals. By Barbalats Lemma, the
surface B is an ultimate bound (as t ) if all signals
are continuous. The system is described as semi-globally
uniformly ultimately bounded.
APPENDIX B - STABILITY OF NEW METHOD
The method of alternate weights is Lyapunov stable in that
all signals are semi-globally uniformly ultimately bounded.
The ability to prevent weight drift better than e-modication
is not apparent in the stability proof, but rather must be
established in simulation experiment. In order to save space,
the stability proof for the scalar version is presented. The
stability proof starts with the (adaptive control) Lyapunov-
like function:
V =
1
2
Iz +
1
2
_
w
T
w + p
T
p + +tr(

V
T

V) + tr(

S
T

S)
_
(48)
The derivative is:

V = z[ w
T
(

Vq) + w
T

Vq + w
T
+
c Gz r]
1

h
w
T

w + tr(

V
T

V) + p
T

p + tr(

S
T

S)
i

V = z( Gz r) + w
T

z(

Vq)

!
+
p
X
j=1
v
T
j

( w
T

)
T
zqj

vj

p
T

p
1

tr(

S
T

S) (49)
Substitution of weight updates (16),(17),(14),(15) gives:

V = Gz
2
+ zr + z + z

w
T
[a(p
T
w
T
) + C(p w)]
p
T
[a( w
T
p
T
) Cp]
p
X
j=1
v
T
j
[bqj(

Sq

Vq) + Dj(s
T
j
v
T
j
)]

p
X
j=1
s
T
j
[bqj (

Vq

S
T
q) Dj
sj]

Next establish the negative semi-deniteness of terms:


[ w
T
(p
T
w
T
) + p
T
( w
T
p
T
) ]
= ( w
T
p
T
)
T
( w
T
p
T
) 0
and again for terms:

p
X
j=1
bj
h
v
T
j
(qj(

S
T
q

Vq) +s
T
j
qj (

Vq

Sq)
i
= bj
p
X
j=1
(v
T
j
qj s
T
j
qj)(

S
T
q

Vq)
= bj(

Vq

Sq)
T
(

Vq

S
T
q) 0 (50)
Now, using r = z|z|, bound the derivative:

V |z|
`
G|z| rz
2
+ C
h
w
T
(p w) + p
T
p
i

p
X
j=1
Dj
h
v
T
j
(sj vj) s
T
j
sj
i

(51)
Establish bounds for the terms:
C[ w
T
(p w) p
T
p] = C[ w
T
( w p) p
T
(w p)]
= C[ w
T
w + w
T
p + p
T
w p
T
p]

2
[ w
T
p
T
]
T

2
+ w [ w
T
p
T
]
T
(52)
and again establish bounds:

p
X
j=1
Dj
h
v
T
j
(sj vj) +s
T
j
sj
i

p
X
j=1

2
[v
T
j
s
T
j
]
T

2
+ vj [v
T
j
s
T
j
]
T

2
[

V

S]
T

2
+ V [

V

S]
T
(53)
Dening

W
a
= diag([ w
T
p
T
]
T
, [

V

S]
T
) results in:

V |z|

G|z| z
2
+ A1 + A2

W + A3|z|

Wa
2
+ W

Wa

(54)
which has the same basic form as (47) so that semi-global
ultimately boundedness of signals can be established in the
same way.
REFERENCES
[1] F. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of
Robot Manipulators and Nonlinear Systems. Philidelphia, PA: Taylor
and Francis, 1999.
[2] L. HSU and R. COSTA, Bursting phenomena in continuous-time
adaptive systems with a o-modication, IEEE Trans. Automat. Contr.,
vol. 32, no. 1, pp. 8486, 1987.
[3] M. French, C. Szepaesvari, and E. Rogers, Performance of Nonlinear
Approximate Adaptive Controllers. West Sussex, England: Wiley, 2003.
[4] J. Spooner, M. Maggiore, R. Ordonez, and K. Passino, Stable Adap-
tive Control and Estimation for Nonlinear Systems, Neural and Fuzzy
Approximator Techniques. Wiley-Interscience, 2001.
[5] C. Macnab, A new robust weight update for multilayer-perceptron
adaptive control, Control and Intelligent Systems, vol. 35, no. 3, pp.
279288, 2007.
[6] , Local basis functions in adaptive control of elastic systems, in
Proc. IEEE Int. Conf. Mechatronics Automation, Niagra Falls, Canada,
2005, pp. 1925.
Proceedings of the 4th International Conference on Autonomous Robots and Agents, Feb 10-12, 2009, Wellington, New Zealand
703

Вам также может понравиться