Академический Документы
Профессиональный Документы
Культура Документы
by Learning
Suguru Arlmoto, Sadao Kawamura, and Fumio Miyazaki
Faculty of Engineering Science, Osaka University, Toyonaka, Osaka,
560 Japan
Received January 26, 1984; accepted March 12, 1984
This article proposes a betterment process for the operation of a mechanical robot in a
sense that it betters the next operation of a robot by using the previous operation’s data.
The process has an iterative learning structure such that the (k + 1)th input to joint
actuators consists of the k th input plus an error increment composed of the derivative
difference between the kth motion trajectory and the given desired motion trajectory.
The convergence of the process to the desired motion trajectory is assured under some
reasonable conditions. Numerical results by computer simulation are presented to show
the effectiveness of the proposed learning scheme.
1. INTRODUCTION
It is human to make mistakes, but it is also human to learn much from experi-
ence. Athletes have improved their form of body motion by learning through
repeated training, and skilled hands have mastered the operation of machines or
plants by acquiring skill in practice and gaining knowledge from experience.
Upon reflection, can machines or robots learn autonomously (without the help
of human beings) from measurement data of previous operations and make
better their performance of future operations? Is it possible to think of a way to
implement such a learning ability in the automatic operation of dynamic systems?
If there is a way, it must be applicable to affording autonomy and intelligence to
industrial robots.
Motivated by this consideration, we propose a practical approach to the prob-
lem of bettering the present operation of mechanical robots by using the data of
T,s+ 1
Arimoto et al.: Bettering Operation of Robots by Learning 125
where y and v denote the angular velocity of the motor and the input voltage,
respectively, and K and T, denote the torque constant and the time constant of
the motor, respectively. Then, the closed-loop system shown in Figure I is
subject to the following differential equation:
Tmj+y=A(v-By)lK, (2)
where j = dy/dt. This can be represented in general form
y + ay = bv, (3)
where we set
a=(l+ABIK)ITm, b=A/KT,.
It should be noted that the solution of Eq. (3) is expressed in integral form:
Now we suppose that a desired time evolution yd(f)of angular velocity is given
for a fixed finite time interval r E [0, 7'J.Throughout this article, we implicitly
assume that the description of the system dynamics is unknown, that is, exact
values of a and b in Eq. (3) or (4) need not be known. It is also assumed implicitly
that yd(t) is continuously differentiable and sufficiently smooth so that every
value of y d ( f )can be reproduced approximately with high accuracy by sampled
and digitized data b d ( k A t ) ,k = 0,1, , . . ,N = T / A t } ,which could be stored on a
VLSI RAM memory. We consider the problem of finding an input voltage v ( r )
whose response is coincident with given yd(f)over the interval t E [0, A.To find
such an ideal input, we first choose an arbitrary voltage function vo(t)and try to
supply the motor with this input voltage. An error btween the desired output
y d ( f )and the first response yO(t)to supplied input vO(r)may arise. This error is
expressed as
126 Journal of Robotic Systerns-1984
The process of this scheme is depicted in Figure 3. We call this iterative pocess
a betterment process because the error ek(t)vanishes rapidly as the number of
trials increases and, moreover, for a norm IIek 11 it becomes iess than for the
previous value 11 ek-& In fact, if yo(0) = yd(0) = y k ( 0 ) for any k, then, in view of
Eq. (8), the error is described by
40) = j d ( 4- j k ( f )
5
KJuI~T~+ O ask-03,
k!
where we set K = m a & s , r ~Ito(t)l.
Arimoto el al.: Bettering Operation of Robots by Learning 127
kth
operation
1 "k
dynamics 'k -yt 'd ,
memory
I
V
- object
dynamics
Yktl
--
A
t
yd
4
"ktl
Ktlth ektl
operation
memory
0.25
0.20
0.15
0.10
0.05
0.00
-0.05
- 0. 10
where u denotes the input vector, y the output vector, and g(t) a given fixed
vector such that u, y, g E R' and hence H ( t , T ) E R'"'.
Suppose that a desired trajectory of system output y d ( t ) is given over a fixed
finite interval t E [0, TI and an initial input function uo(t)is arbitrarily chosen. We
assume that Y&), u(t),g(t), and H ( t , T ) are continuously differentiable in f and
7.
Now we construct an iterative betterment process for system Eq. (12) as
follows:
1
129
uk
from memory
uk
dynamics
'k p. yd
I t o memory
To see the effect of betterment for the iteration (13), it is necessary to intro-
duce a vector norm for r-vector-valued functions e(t) defined on [0, TI as follows:
where A is a positive constant. Then, for the betterment process defined by Eq.
(13), the following theorem holds:
THEOREM 1: If yd(0) = g(O), 11 I,-H(t, t)r(t) < 1 for all t E [0, TI, and a
given initial input &(t) is continuous on [0, TI, then there exist positive constants
A and po such that
I1e k + l Ilk II
5 PO e k Ilk and 0 5 Po < 1 (15)
for k = 0, 1, 2 , . . . ,where the matrix norm 11 G for r X r matrix G = (gjj) stands
for
The proof will be given in the Appendix. It follows from this theorem that
IIek(t)Ilh+O as k + s 9 (16)
and hence
f,(t)* fd(t) as k + s. (17)
It is clear by definition of the norm 11 * \IA that these convergences are uniform in
t E[O, TI. Moreover, on account of the fact that
130 Journal of Robotic Systems-1984
Z
132 Journal of Robotic Systems-1 904
At this stage, we must bear in mind that Eq. (28) itself is a linear time-varying
dynamical system, like Eq. (19a) [except for the forcing term g l ( f ) in Eq. (28)],
though all coefficient time-varying functions together with gl(t) cannot be evalu-
ated exactly. However, if we choose an adequate output vector y for system (28)
and a gain matrix r(r)in order to satisfy the assumption of Theorem 1, then it
is possible to apply the betterment process proposed in the previous section. It
should be noted again that in the application of such a betterment process all
time-varying functions of the coefficients in the linear dynamical system [Eqs.
(19a) and (19b)l need not be known in principle, but only the output yk(t) for
each operation trial must be known.
where we put
n n
where
This means that Eq. (28) reduces to a form of Eq. (12) even if a forcing term gl(r)
exists in Eq. (28).
Now we check the validity of the most important assumption in Theorem 1,
which is described in this case by
=1
I m-Bl(t)r(t) Ilm * (33)
As seen in the derivation of Eq. (27) from Eq. (26), B,(t) = R - ' [ 8 d ( f ) ] . Hence,
if we set r(t)= R [ 8 d ( t ) ] , the assumption of Theorem 1 is clearly satisfied with the
condition
11 In-H(t, t)r(t)11- = 11 1n-R -'[ed(t)lR[ e d ( t ) ] 1
15
=~~z~-z"~~m=o<l.
In general, most entries of the inertia matrix R ( 8 ) are complicated functions of
8 and therefore it is difficult to compute exact values of R ( O d ( f ) ] in real time.
However, it is true that R ( 8 ) remains positive definite for any 8. This suggests
that in most configurations of the manipulator there is the possibility of selecting
an appropriate constant matrix r such that
11 I,,-&(t)r[Im= llZ,,-R -'(ad) rllm < 1.
Finally, we remark that, from the original problem of controlling the manipu-
lator posed in the previous section, a desired trajectory y d ( f ) for the variational
system should be given as Y d ( t ) = 0 because x(t) represents the small difference
of 8(t) with the original desired motion a d ( ? ) , that is,
x(t) = e(t) - e,(t).
Hence, if it holds that xk(0)= Y d ( 0 ) = 0 for all k, then all assumptions of Theorem
1 are satisfied and thereby the betterment process converges. Fortunately, the
initial condition x(0) = 0 is usually satisfied if the manipulator is still at the
beginning of every operation trial, that is, &(O) = 0, and 8 d ( t ) is chosen to satisfy
the initial condition &(o) = 0.
In conclusion, it is shown that the betterment process of Eq. (30) for the system
134 Journal of Robotic Systems-1 984
Figure 7(a). Plot of y k ( t ) = &(t) (acceleration) for betterment process with r = 1 .O.
Arimoto et al.: Bettering Operation of Robots by Learning 135
2.0
1 .o
-1 .o \K = l
1 .o
0.5
0 0.5 1 .o
5
t
-5
-10
Figure 8(a). Plot of y k ( f ) =&(r) (acceleration) for betterment process with r = 0.5.
Arimoto et al.: Bettering Operation of Robots by Learning 137
1 .o
0.5
0
0.5 1.0
Figure 8(c). Plot of /"'Y,(T)~T = x k ( t ) (position) for betterment process with r = 0.5.
138 Journal of Robotic Systems-1 984
y = [ l O ] i “i
J =x
Hence, the key assumption of Theorem 1 is not satisfied for any choice of I‘.
VI. CONCLUSION
With the goal of applying “learning” control to robot manipulators, we pro-
posed an iterative betterment process for mechanical systems that improves their
present performance of motion by use of updated data of the previous operation.
It was shown theoretically, together with numerical examples, that the better-
ment process can be applied to the motion control of manipulators if a desired
motion trajectory is given.
In this article, the discussion is restricted to the case of linear dynamical
systems and therefore the equation of motion of the manipulator had to be
represented in the neighborhood of the desired trajectory. However, this
representation was necessary only for the theoretical proof of the applicability of
the betterment process to the control of robots. In practice, this iterative scheme
could be associated directly with the original nonlinear dynamics of robots, that
is, the actual betterment process is constructed as shown in Figure 9.
Finally, we point out that the betterment process for linear dynamical systems
with constant coefficients has many interesting aspects in relation to linear sys-
tem theory. It should also be pointed out that it is possible to prove the con-
vergence of the iterative betterment process constructed for a class of nonlinear
dynamical systems like robot manipulators. These discussions are given in subse-
quent papers.’*lo
APPENDIX
Proof of Theorem 1
First we note that the norm 11 G 11- is the natural matrix norm induced by the
vector norm
IIfIIm=maxIJI
lsisr
for vector f = (f,,. . . J,). Hence, it is evident that
Arimoto et al.: Bettering Operation of Robots by Learning 139
I
I
I
I t,o memory
I
Figure 9. Learning control scheme by betterment process for robot manipulators.
140 Journal of Robotic Systems-1 984
where
p = max II I,-H(t, t)r(t)Il m ,
OszrT
Substituting this into Eq. (A3), we obtdin Eq. (15). Thus, Theorem 1 has been
proved.
References
1. F. Rosenblatt, Principles of Neurodyqamics, Perceptrons and the Theory of Brain
Mechanisms, Spartan Books Co., Waghington, DC, 1961.
2. Ya. Z. Tsypkin, Adaption and Learning in Automatic Systems, Academic, New
York, 1971.
3. S. Arimoto, S. Kawamura, and F. Miyazaki, “Can mechanical robots learn by
themselves?” Proc. 2nd Inr. Symp. Rqbotics Res., Kyoto, Japan, August 1984.
4. R. P. Paul, Robot Manipulators, The’MIT Press, Cambridge, 1981.
5. P. Vukobratovic, Scientific Fundamehtals of Robotics I and 2, Springer-Verlag,
Berlin, 1982.
6. M. Takegaki and S. Arimoto, “A new feedback method for dynamic control of
manipulators,” Trans. ASME J Dyn. Sysr. Meas. Control, 103, 119-125 (1982).
7. S. Arimoto and F. Miyazaki, “Stability and robustness of PID feedback control for
robot manipulators of sensory capability,” Proc. 1st Int. Symp. Robotics Res., The
MIT Press, cambridge, MA, 1983.
8. F. Miyazaki, S. Arimoto, M. Takegaki, and Y.Maeda, “Sensory feedback based on
the artificial potential for robot manipulators,” Proc. 9th IFAC, Budapest, Hungary,
Julv 1984.
9. S. Kawamura, F. Miyazaki, and S. Arimoto, “Iterative learning control for robotic
systems,” paper presented at IECON ’84, Tokyo, Japan, 1984.
10. S. Arimoto, S. Kawamura, F. Miyazaki, “Bettering operation of dynamic systems
by learning; A new control theory for servomechanism or mechatronics systems,”
paper presented at 23rd IEEE Conf. Decision and Control, Las Vegas, NV, 1984.