Вы находитесь на странице: 1из 6

A Rule-Based Neural Controller for Inverted Pendulum

System *

Jianbin H a o t, ShaohuaTan and J o o s Vandewalle t


t ESAT Lab., Dept. E.E., K.U. Leuven
K. Mercierlaan 94, B-3001 Heverlee, Belgium
Dept. E.E., National University of Singapore
10 Kent Ridge Crescent, Singapore 0511

Abstract- This paper tries to demonstrate how restricted to a few fixed functional types. An extreme
a heuristic neural control approach can be used case is that the control can only assume two different con-
to eolve a complex nonlinear control problem. As stant values. Control problems of this kind can usually be
well as to swing up the pendulum, the controller is translated into finite decision problems that can then be
also required to bring the cart back to the origin solved using techniques such as simple neural nets [l].
of the track. Through the solution of this spe- The idea of translating control problem into decision
cific control problem, we try to illustrate a heuris- problem can also be extended beyond the classes men-
tic neural control approach with task decomposi- tioned above. In fact, if the state-space of a plant is parti-
tion, control rule extraction and neural net rule tioned into finite number of regions, a continuous control
implementation as its basic elements. Specializ- action defined over the state-space can, under some cir-
ing to the pendulum problem, the global control cumstances, be approximated by a constant in each of
task is decomposed into sub-tasks, namely, pendu- the regions. If we further confine the input to have only a
lum positioning and cart positioning. Accordingly, finite number of levels (as in the case of finite control hori-
three separate neural sub-controllersare designed zon [5]), then the control problem can readily be treated
to cater to the sub-tasks and their coordination. as a finite decision problem. Observe that this idea is also
The simulation result is provided to show the ac- the principle behind the so-called fuzzy control approach
tual performance of the controller. [7],though the state-space partition in fuzzy control does
not generate regions with crisp boundaries.
I. INTRODUCTION An obvious advantage of this decision approach to con-
trol is that controller can be designed in a model-free fash-
Many industrial control problems can simply be phrased ion. This is in contrast with the traditional control design
as choosing right inputs to bring the state of a plant to methodology where model of a plant is essential to the
some desired position in an appropriate state space. A control design. Originated by Widrow [2], the decision ap-
typical example is slabilzzaiion, in which the desired con-
proach points to an interesting direction in handling the
trol action is to bring the state of a plant to zero by design-
classes of difficult nonlinear control problems. Other pio-
ing appropriate control inputs. Another example is the
neering works along the same line can be found in (3][4],
shift of set-point. In this case, control inputs are sought though these works typically follow a stochastic rather
in order to bring the state of a plant from a previous set- than deterministic formulation.
point to the intended one. Control problems as such can
The translated decision problems are, after all, the
be extremely involved even for relatively simple nonlinear problems of pattern recognition, which can be approached
systems, no general methodology is available for actually
with many different conventional techniques. Noticeable
carrying out the design.
for its relative versatility among many of these techniques
There exist, however, certain classes of nonlinear con- is the so-called neural net approach. Neural nets can have
trol problems for which the allowable control inputs are many different varieties, and a widely-appreciated family
'This research work WM partially carried out at the ESAT labo- of neural nets is the layered feedforward net with bipolar
ratory of the Katholieke Universiteit Leuven, in the framework of a neurons.' This structure is called Madaline by Widrow [l],
Concerted Action Project of the Flemish Community, entitled Ap-
plicable Neural Networkr. The rdentific responsibility is assumed lBipolar neurons are the neurons with the output of either +1
by its authors. or -1. A neural net with bipolar neurons will simply be referred to

0-7803-0999-5/93/$03.00Q1993 IEEE 534


-
* plant )r
> m

control

-
expert
c
+F ---i) -F
neural
I 1 I
-h x=o x x,
Figure 1: The scheme of the rule based controller design Figure 2: The cart-pendulum system, the angle and angle
speed are positive in clockwise direction.

and is the limit case of the well-known back-propagation


net [6]. The universal realizability is the fundamental rea- individual problem and their coordination are extracted.
son behind the preference of this type of neural nets. The final step is to code the rules for individual decision
An important technique that can be used to design feed- problems and coordination into separate sub-neural nets.
forward nets for the determination of decision boundaries These neural nets can be feedforward or feedback bipolar
using the feedforward neural nets is the so-called super- nets, and their proper combination will give rise to the
vised learning proposed in [l]. The difficulty in applying desired global neural controller. The idea of the approach
the supervised learning for actual control problems is that is illustrated in Figure 1.
it is not always clear what the targets are for the neural Apparently, this approach is heuristic in nature and is
nets to learn. Besides, the uncertainty in the choice of pat- along the same line as the well-known fuzzy and expert
tern representation and in the generalizability of the nets control approaches that are built upon the set of heuristic
turn out to be essential stumbling blocks for the technique rules. The distinct features of our approach are, perhaps,
to work. Another design technique is called direct design the idea of dynamical rules, and the way the rules are
[ll]. We shall see later that this technique is ideal for realized. In fact, these ideas may lead to a substantial
coding rules directly into a feedforward net. reduction in the number of rules required, consequently
Bipolar neural nets with feedback connections are also result in a compact feedback neural net controller. We
useful in solving the pattern recognition problem. A feed- shall highlight the differences in the concluding section of
back neural net is able to carry out dynamical operations, the paper.
thus suitable in dealing with time related decision prob- To demonstrate the relevance of the approach to real
lems. However, due to the complexity in the analysis and world problem solving, we shall apply it to a truly non-
the design, the potential of the feedback neural nets has linear control problem: Change the set-point of the well-
not been fully explored in the context of decision con- known inverted pendulum system. The control task is to
trol. No systematic learning or direct design techniques swing up a pendulum mounted on a cart from its stable
are available. One design technique that we shall employ position (vertically down) to the zero state (up right) and
in this paper is to directly code the extracted dynamical keep it there by applying a sequence of two opposing con-
control rules into a feedback neural net in much the same stant forces of equal magnitude to the mass center of the
way as we design a feedforward net. cart. An additional requirement is that the displacement
The neural decision control approach we shall explore of the cart itself is confined to within a pre-set limit during
consists of three basic steps: Task decomposition, con- the swinging up action, and it will eventually be brought
trol rule extraction and neural net rule implementation. to the origin of the track. Figure 2 shows the the setup
We start by decomposing a complex nonlinear control of the inverted pendulum system. Note that fX, is the
task into manageable sub-tasks that are in turn trans- pre-set limit of the track.
lated into decision problems. Note that decomposition is The control problem described above is a nontrivial
one of the principles widely used in AI. For each of the problem of nonlinear regulation. It is apparently more dif-
decision problems, the heuristic decision rules concerning ficult compared to the pendulum balancing problem (and
its variations) widely adopted as a benchmarking test aye
a~ bipolar neural net. tem Tor various neural controllers [l][8][9][lo]. The latter

535
control problem is linear in nature, thus does not really allow the linearization of the mathematical model. More-
serve to illustrate the strength of the neural controllers over, there seems to be no nonlinear control technique that
that are supposed to be apt at handling hard nonlinear solves the problem in a systematic manner. It is obvious
control problems. In fact, the balancing problem can be that a complex nonlinear model even to the extreme detail
solved more elegantly by using techniques developed in may, in a lot of cases, still be unable to help to determin-
linear control theory. ing the appropriate control2. This is where a model-free
To come up with manageable rules, the global control control approach (heuristic approach) in a sub-class is pre-
task is decomposed into sub-tasks of pendulum positioning ferred.
and cart positioning. Accordingly, three separate neural To motivate our model-free heuristic design approach,
sub-controllers are designed to cater to the sub-tasks and let us first analyze how a human operator would try to
their coordination. These are pendulum sub-controller accomplish the desired control objective. Naturally, the
(PSC),cart sub-controller (CSC) and the switching sub- first step towards the control involves a trial-and-error
controller (SSC). Each of the sub-controllers is designed process of getting a feel of the system. This is a process of
based on the rules and guidelines obtained from the expe- rule extraction. The extracted rules are situation-action
riences of a human operator. Simulation analysis is also in nature and completely different in nature from the pure
carried out to show the performance of the neural con- mathematical modelling of the dynamics of the system.
troller. After getting a feel of the system, the next step the hu-
man operator is likely to take is to identify that the pendu-
11. THEPLANTDYNAMICS AND PROBLEM
lum swing-up operation and the cart-positioning can actu-
DECOMPOSITION
ally be viewed as two separable sub-control objectives. To
Though the mathematical model of the inverted pendu- meet the final control demand, he only has to first swing-
lum system is not relevant to the design of our controller, up the pendulum regardless of the cart disposition (within
we nonetheless write it down here to show the nonlinear- the pre-set limits), then repositioning the cart back to the
ity inherent in the system, and set the notations. With origin of the track while balancing the pendulum. The
appropriate setting of the parameters, these equations are later two sub-problems are apparently considerably man-
also used to generate the next state based on the current ageable compared to the original one.
input and state of the system, which is the basis of our In fact, by playing with a computer emulation program
simulation and design. of the inverted pendulum system, the authors have gone
The inverted pendulum system shown in Figure 2 con- through exactly the same steps described above to reach
sists of a cart of mass m and a pendulum of mass m, the control objective. It follows from our own experiences
centered at the half length 1. The constant force F is ap- that trying to simultaneously take care of swinging-up and
plied in both horizontal directions through the mass center the repositioning proves to be extremely difficult. This
of the cart so as to move it up and down the track. The can perhaps be interpreted in terms of trajectory flow in
track is delimited at both ends as shown in Figure 1 by the state-space composed of states 8,8,z, k. The feasible
kX,,, . According to physics, the system is modeled by the trajectory may first wander far away from the origin before
following set of differential equations it eventually comes back. Though the decomposibility of
a complex problem seems generically possible, the actual
process of decomposition is not as simple at all as shown in
this example. It may involve analysis, experiences through
trial and error, human intuition and generalization.
.. F + m/[$sin 8 - ii COS 01 The actual process of decomposition and rule extraction
t =
m, m + (2)
for a particular system is influenced by the nature of the
where 6 and z are respectively, the angular and the cart underlying plant as well as the control requirements. For
displacements; g = 9.8m/s2 is the acceleration due to instance, the decomposition and rules will be different if
gravity. To be specific, the parameters are chosen as an additional requirement of time-optimality is required.
F = 10N,m, = l.Okg, m = O.lkg, and 1 = 0.5m for Although the task of decomposition is often explicitly
the purpose of controller design. Note that the friction realized and attempted by a human operator, the aspects
between the cart and the track, and the one at the hinge of rules he comes up to deal with each sub-problem and co-
between the pendulum and the cart are all neglected in ordination of the rules for different sub-problems are less
the above equations to make them look less formidable.
Obviously, even start with the known Equations (1) and 21n the present paper, the model refers, specifically, to the math-
ematical equations of the systems. Thus, a set of heuristic rules
(2), the control does not seem to follow from any exist- describing the behaviour of a system is not considered aa the system
ing linear techniques as the control problem itself will not model.

536
stant force along a single direction (it may become possible
F
plant e e X i if the magnitude of the force is extremely large). We will
have 40 switch the direction of the force at certain point
F
(say, 8 = p ) of the state space. If the specified magnitude
of F is too small, more than one such force switching are
needed. For our case ( F = lo), one switching is enough if
ubcontrolle p is suitably chosen.
switching The behavior of the system under such control input
sequence is as follows. When +F is applied, the pendulum
(starting from initial position of 8 = a, 8 = 0) swings up
to a larger 8 on the side of 8 > .rr with 8 > 0, until 8
-
decrease to zero (at which point 8 2?r < -a)..Then the
pendulum begins to fall down with a negative 8 . When 8
Figure 3: The overall control scheme becomes equal to p , the force changes to -F resulting in
the pendulum falling down with an increasing negative 8.
When the pendulum swjngs back to the point 8 = a,there
explicit. For example, while playing with the emulation will still be a negative 0 (note that without changing the
program, we can not substantiate our instant decisions direction of F from +F to -F at 8 = p , 8 will be zero at
that eventually contribute to an eventual success. This this point). By continuously applying -F,the pendulum
tacitness of the rules is often the most difficult aspect of will swing up to the point where 8 < CY on the side of
all rulebased approaches. The degree of difficulty in ob- 0 < lr.
taining rules varies for different control problems. In the Following the preceding analysis, it can be readily seen
following section, we shall work out the appropriate rules that the choices of Q and p are vital to the success of
for the inverted pendulum system to hint the general prin- this sub-controller. The improper choices of CY and p will
ciples we normally follow. result in either being unable to swing the pendulum up
111. RULEEXTRACTION
AND NEURAL
NET with a single change of forces or the oscillation between
IMPLEMENTATION the swing-up and the stabilization actions. Several trial-
and-error processes indicate that one of the proper choices
We now discuss how the control rules are extracted and is a = 0.2 and p = -1.
implemented by simple feedforward and feedback neural It is relative simple for the design of the stabilizing part
nets. of PSC. Based on our intuition, it ,is easy to see that we
As discussed before, the two sub-problems, namely, should apply,+F when 8 > 0 and 8 > 0, apply -F when
swinging-up the pendulum and repositioning the cart to 0 0 and 8 c 0, and finally, apply +F or -F when
the origin, will be separately dealt with using two sub- 0 . 8 c 0 depending on the ratio K = 8 / 8 ( K will be chosen
controllers, Pendulum Sub-controller (PSC) and Cart to be -15).
Sub-Controller (CSC). The actions of the two sub- To summarize, the rules for the design of PSC are given
controllers are coordinated by the third sub-controller below:
called Switching Sub-controller (SSC). The overall de-
composed control scheme can be shown in Figure 3. The Rules for PSC:
rule extraction processes and neural net rule implemen-
tation for all the three sub-controllers are detailed in the 0 +F when 4 > p , otherwise, -F (swinging up);
subsequent sub-sections. +F when KO + > 0, otherwise, -F (stabilization);
A . Pendulum Sub-Controller change from swing-up to stabilization when -CY <
The sub-controller for swinging-up and stabilizing the 0 < CY.
pendulum is viewed as consisting of three functional parts: From the geometric theory of neural networks [ll], we
Pendulum swinging-up, stabilization and switching from know that these rules can easily be implemented in a two
swinging-up action to stabilization. layer multilayer perceptron neural network, which is show
The switching condition between the two control actions in the upper part of Figure 4.
is a prescribed small angular displacement a (a> 0), i.e.,
whenever -CY < 8 < a,the switching will take place. B. Cart Sub-Controller
Experience shows that it is impossible to swing up the The position control of the cart is quite simple. The di-
pendulum to the range of -a < 0 < a by applying con- rection of the applied force is determined by z and i,the

537
*
I
B
42
a
.l
m
3
I
4.1' I
0 1 2 1.5 1.55 1.6
samples XlW sampls XlW

4 (d)
6 0.2 I
- 4
I

!
The whole neural controller with SSC high- 3

-
Figure 4: 2
lighted. a
r 0-
8
-2 4.2
position and the speed of the cart. When x > 0 and x > 0, 0 1 2 1.5 1.55 1.6
-F should be applied; when x < 0 and x < 0, +F should samples XlW samples x104
be applied; when x . x < 0, the choice of +F or -F de-
pends on the ratio p = x/x. Based on the experience, p Figure 5: The result of simulation. Refer to the text for
has been chosen to be -0.5. So the rules for CSC are the a complete explanation.
following.

Rule for CSC: angle but greater than p), the SSC will switch the control
to the PSC which will bring the pendulum back t o the
e +F when px + x < 0, otherwise, - F . zero state and also move the cart toward the origin. This
The neural network implementation of this sub- switching process will continue until the cart reaches the
controller is shown in the lower part of Figure 4. origin of the track, after which both the pendulum and
the cart will swing back and forth within a small range of
C. Switching Sub-Controller zero state. Let us summarize the rules for SSC as follows.
Although the two sub-controllers (PSC and CSC) de-
signed above are able to carry out separate pendulum Rules for SSC:
swing-up and cart positioning control actions, they need to
be combined dynamically by the switching sub-controller e switch to CSC when -p < B < p and -v < 4 < U ;
(SSC) t o finally realize the global control objective. The e switch back to PSC when B < -7 or 0 < -7.
idea of switching between the two sub-controllers is based
on the following simple fact: When the pendulum is within Because of the dynamical nature of this switching mech-
a very small range of zero state (-p < B < p, -v < 0 < v , anism, we have used time-delay MLP (a kind of simple
where p , v are small positive values), it will need more feedback neural nets) for the implementation of SSC. To-
reverse forces to bring the pendulum back to this range gether with PSC and CSC, the SSC is shown in Fig-
than to push it away. This phenomenon can be roughly ure 4 as part of the whole neural controller. In this
explained by the torque applied on the pendulum, i.e., realization the parameters have been chosen as follows,
the smaller 101 is, the larger this torque will be when F p = 0.004, v = 0.2 and 7 = 0.03.
remains the same.
Based on the above phenomenon, we will design the IV. SIMULATION
SSC to work in the following way: Whenever -p < B < p The simulation of the neural controller for the inverted
and -v < 8 < v , switch the control to the CSC which pendulum system is done on a DECstation using the soft-
will generate a sequence of identical forces ( - F or +F ware package Matlab. The sampling period is 0.01s and
depending on the state of the cart); the actions of CSC a typical simulation lasts for more than 24000 samples,
will cause the pendulum t o leave the small region around which is equivalent to 4 minutes in real time. One of the
zero state till the point that B > y or B < -y (y is a small simulation results are shown in Figure 5.

538

- .. -. ... . . _. . .. .... ..
In Figure 5(a), the angular displacement of the pendu- traction. Moreover, the approach does not intend to solve
lum is shown over the whole simulation process, while 5(b) all complex nonlinear control problems. What it really at-
shows a zoom-in picture for the samples from 15000 to tempts to solve is the nonlinear control problems that are
16000. Figure 5(c) shows the whole evolution process of less well-defined, but nevertheless seem intuitively solv-
the cart position, while 5(d) gives a detailed description of able.
the process for samples from 15000 to 16000. It is easy to
see that the stabilization of cart is achieved after the sta- REFERENCES
bilization of pendulum and the cart keeps moving within [l] B. Widrow (1987), The original adaptive neural
a small range of track origin after a short period of time. net broom-balancer, Int. Symp. Circuits and Syst.,
pp.351-357.
v. DISCUSSIONS A N D CONCLUSIONS
We have presented a heuristic neural control approach for [2] B. Widrow (1962), Reliable, trainable networks
solving nonlinear control problems. This approach is de- for computing and control, Aerospace Engineering,
cision based, and has task decomposition, rule extraction September , pp .78- 123.
and neural net rule implementation as its essential ele- [3] K. Fu (1970)) Learning control systems - Review
ments. Via the specific control problem of the inverted and outlook, IEEE ?Fans. Autom. Control, Vol.AC-
pendulum, we have demonstrated how each of the three 15, pp.210-221.
steps is actually considered and realized.
It may be of interests to compare this approach to other [4] G. N. Saridis (1981), Application of pattern recogni-
rule-based control approaches such as fuzzy control and tion methods t o control systems, IEEE ?Fans. Au-
expert control. It turns out that the rule-based neural con- t om. Control, Vol . AC-26, pp .638-645.
trol has following important advantages. First, it is able
to implement dynamical decisions and rules than just the [5] D. W. Clarke, C. Mohtad and P. S. Tuffs (19871,
static mapping actions. More precisely, in our feedback Generalized predictive control - Part. I, The ba-
neural controller, the outcome of a sub-decision process sic algorithm, Azltomatica, Vo1.23, pp.137-148.
or rule can be dependent on the previous outcome of its D. E. Rumelhart, G. E. Hinton, and R.J. Williams
own. This dynamical decision process, lacking in both (1986), Learning internal representation by er-
fuzzy and expert control approaches, strengthens consid- ror propagation, In Parallel Distributed Processing,
erably the decision ability, and sometimes proves to be Vol.1, Ch. 8, by D . E. Rumelhart and J. McClelland,
indispensable for the decision control problems. In fact, Cambridge MA: MIT Press.
it is the problem decomposition that necessitates the dy-
namical decision process. And decomposition is often con- M. Sugeno (1985), An introductory survey of fuzzy
sidered to be a key to complex control problems. control, Inform. Sci., Vo1.36, pp.59-83.
As a fringe benefit, dynamic decision requires less num-
ber of rules than a pure static decision. This is easily A. G. Barto, R. S. Sutton and C. W. Anderson
understood as in the static case, time completely unfolds (1983), Neuronlike adaptive elements that can solve
resulting in the need to cover all the time aspects of the difficult learning control problems, IEEE ?Fans.
decision. System, Man and Cybernetics, VoLSMC-13, pp.834-
Secondly, once constructed, the rule-based neural con- 846.
troller is a deterministic nonlinear dynamical system (with V. V. Tolat and B. Widrow (1988), An adaptive
local feedback only), thus allowing the analysis to be car- broom balancer with visual inputs, Proc. IEEE In-
ried out for performance evaluation. This is in contrast to tern. Conf. on Neural Net., San Diego, V01.2, pp.641-
both fuzzy and expert controls, where their final linguistic 647.
constructs prevent an analytic evaluation.
Finally, because of its parallel nature the rule-based C. W. Anderson (1989))Learning to control an in-
neural controller is computationally advantageous over verted pendulum using neural networks, IEEE Con-
fuzzy and expert control. The computational advantage trol Systems Magazine, Vol.9, pp.31-37.
also follows from the reduced number of rules as a result
of dynamical decision process. J . Hao, S. Tan and J. Vandewalle (1990), A geomet-
Like all other rule-based approaches, the rule-based ric approach to the structural synthesis of multilayer
neural controller is inherently non-analytical and impre- perceptron neural networks, Proc. INNC-90 Paris,
cise. The ambiguity follows from the heuristic and sub- Vol. 2, p p .881-885.
jective nature of the problem decomposition and rule ex-

539

Вам также может понравиться