Вы находитесь на странице: 1из 123

Lecture Notes in

Control and
Information Sciences
Edited by A.V. Balakrishnan and M.Thoma

Yousri M. EI-Fattah
Claude Foulard

Learning Systems:
Decision, Simulation, and
Control

Springer-Verlag
Berlin Heidelberg New York 1978
Series Editors
& V. Balakrishnan • M. Thoma

Advisory Board
A. G. J. MacFarlane • H. Kwakernaak • Ya. Z. Tsypkin

Authors
Dr. Y. M. EI-Fattah
Electronics Laboratory
Faculty of Sciences
Rabat, Marocco

Professor C. Foulard
Automatic Control Laboratory
Polytechnic Institute of Grenoble
Grenoble, France

ISBN 3-540-09003-? Springer-Verlag Berlin Heidelberg NewYork


ISBN 0-38?-09003-7 Springer-Verlag NewYork Heidelberg Berlin

This work is subject to copyright. All rights are reserved, whether the whole
or part of the material is concerned, specifically those of translation, re-
printing, re-use of illustrations, broadcasting, reproduction by photocopying
machine or similar means, and storage in data banks.
Under § 54 of the German Copyright Law where copies are made for other
than private use, a fee is payable to the publisher, the amount of the fee to
be determined by agreement with the publisher.
© by Springer-Verlag Berlin Heidelberg 1978
Printed in Germany
Printing and binding: Beltz Offsetdruck, Hemsbach/Bergstr.
2061/3020-543210
FOREWORD

This monograph studies topics in using learning systems for decision, simulation, and
control. Chapter I discusses what is meant by learning systems, and comments on their
cybernetic modeling. Chapter I I concerning decision is devoted to the problem of pat-
tern recognition. Chapter I l l concerning simulation is devoted to the study of a cer-
tain class of problems of collective behavior. Chapter IV concerning control is de-
voted to a simple model of f i n i t e Markov chains. For each of the last three chapters,
numerical examples are worked out entirely using computer simulations. This monograph
has developed during a number of years through which the f i r s t author has profited
from a number of research fellowships in France, Norway, and Belgium. He is grateful
to a number of friends and co-workers who influenced his views and collaborated with
him. Particular thanks are due to W. Brodey, R. Henriksen, S. Aidarous, M. Ribbens-
Pavella, and M. Duflo.

Y.M. EI-Fattah
C. Foulard
CONTENTS

ABSTRACT
CHAPTER I . CYBERNETICS OF LEARNING
l.l. System Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4. Learning Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5. Learning and Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6. Types o f Learning Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.7. Mathematical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
CHAPTER I I . DECISION - PATTERN RECOGNITION
2.1. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II
2.2. Feature E x t r a c t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3. Karuhnen - Loeve Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4. I n t r a s e t Feature E x t r a c t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5. I n t e r s e t Feature E x t r a c t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6. Optimal C l a s s i f i c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7. S t a t i s t i c a l Decision Algorithms . . . . . . . . . . . . . . . . . . . . . . 23
2.8. Sequential Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.9. Supervised Bayes Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10.Non-Supervised Bayes Learning . . . . . . . . . . . . . . . . . . . . . . . . 35
2.ll.ldentifiability o f F i n i t e Mixtures . . . . . . . . . . . . . . . . . . . 36
2 . 1 2 . P r o b a b i l i s t i c I t e r a t i v e Methods - Supervised Learning 37
2 . 1 3 , P r o b a b i l i s t i c I t e r a t i v e Methods - Unsupervised Learning 42
2 . 1 4 . S e l f Learning w i t h Unknown Number o f Pattern Classes 46
2 . ] 5 . A p p l i c a t i o n - Measurement S t r a t e g y f o r Systems
Identification ....................................... 49
2.16.Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
CHAPTER I I I . SIMULATION - MODELS OF COLLECTIVE BEHAVIOR
3.7. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2. Automata Model I - S u f f i c i e n t a p r i o r i Information... 65
3.3. Automata Model I I - Lack o f a p r i o r i Information . . . . . 70
Vl

3.4. Existence and Uniqueness o f the Nash Play . . . . . . . . . . 71


3.5. Convergence Theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.6. Environment Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.7. Market Price Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.8. Resource A l l o c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Appendix - P r o j e c t i o n o p e r a t o r . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CHAPTER IV. CONTROL - FINITE MARKOV CHAINS
4.1. Markov Decision Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I01
4 . 2 . Conditions o f O p t i m a l i t y . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3. Automaton Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4. Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.5. A c c e l e r a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.6. Numerical Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.7, Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
EPILOGUE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
ABSTRACT

This monograph presents some fundamental and new approaches to the use of learning
systems in certain classes of decision, simulation, and control problems.

To design a learning system, one should f i r s t formulate a n a l y t i c a l l y the goal of


learning that has to be reached at the end of the learning process. As a rule that
goal of learning depends on the environment input-output characteristics - convenien-
t l y considered to be stochastic - for which there is not s u f f i c i e n t a p r i o r i i n f o r -
mation. That incomplete d e f i n i t i o n of the goal of learning is compensated by necessary
processing of current information. Basic definitions and concepts related to lear-
ning systems are presented in Chapter I.

As for decision problems we consider the class of pattern recognition problems in


Chapter I I . Learning systems can be trained to apply optimum s t a t i s t i c a l decision
algorithms in the absence of a p r i o r i information about the classified patterns.
The accompanying problem of feature extraction in pattern recognition is also discus-
sed. As an application, we consider the problem of optimal measurement strategies
in dynamic system i d e n t i f i c a t i o n . Numerical results are given.

In Chapter I l l we present a novel model of learning automata for simulating a cer-


tain class of problems of collective behavior. Two applications are considered. One
is the resource allocation, and the other the price regulation in a free competitive
economy. The model performance is studied using computer simulations. Analytical
results concerning the l i m i t i n g behavior of the automata are also given.

A certain control problem of stochastic f i n i t e systems modelled as Markov chains is


considered in Chapter IV. The control decision model is considered to be a learning
automaton which experiments control policies while observing the system's state at
the consecutive epochs. Two cases are studied : complete and incomplete a p r i o r i
information. In the l a t t e r case the automaton's policy is dual in nature - in the
sense that i t i d e n t i f i e s the chain's transition probabilities while controlling the
system. Conditions are given for the control policy to converge with probability l
to the optimal policy. Acceleration of the adaptation process is also examined.
Computer simulations are given for a simple example.
CHAPTER I

CYBERNETICS OF LEARNING

"Lacking a birth in disorder,


The e n l i v i n i n g detestation of order,
No liberating discipline can ever see
or be the l i g h t of a new day."
David Cooper, "The Grammar of Living."

l . l SYSTEMCONCEPT.

Various definitions can be given for a system. A d e f i n i t i o n i s c l e a r l y dependent


on the context which i t intends to serve. For our purposes the system is defined
by behavior, i . e . by the relationship between i t s input and output.

A model is adequate to describe a system's behavior when i t simulates the rela-


tionship between the system's output and input. In other words the model for any
given sequence of inputs produces the same sequence of outputs as the system. The
input x of the system at any time instant is assumed to belong to the set of possi-
ble alternatives X. The output y, likewise, belongs to the set of possible alterna-
tives Y. I t is usually assumed that X and Y contain a f i n i t e number of elements
(vectors).

Different types of systems can be distinguished depending on t h e i r behavior or


kind of relationship between t h e i r inputs (stimuli) and outputs (responses). A cla-
s i f i c a t i o n of certain types is given below.

(a) Deterministic Systems. All the relations are presented by mappings (either one-
to-one or many-to-one). In other words, the output variables are functions of the
input variables. No probabilities have to be assigned to elements of the relations.
Deterministic systems can be subdivided into :

i - Combinational (memoryless) systems. The output components are uniquely determi-


ned as certain combinations of the instantaneous values of the input components.

i i - Sequential systems. In this case there exists at least one input which is asso-
ciated with more than one output. The d i f f e r e n t outputs of the system to the same
input belong to d i f f e r e n t , but accurately defined, sequences of inputs which prece-
ded the given input.
(b) Probabilistic (Stochastic) Systems. At least one of the input output relations
is not presented by a mapping ( i t is presented by a one-to-many relation). Each'ele-
ment (a,b) of the relation is then associated with a conditional probability P(b/a)
of occurrence of b when a occurs. Probabilistic systems can be subdivided into :

i - Memoryless p r o b a b i l i s t i c systems. All output components are defined on the basis


of instantaneous values of input components.

i i - Sequential p r o b a b i l i s t i c ~systems. At least one output component is not defined


by the instantaneous values of input components.

In a sequential system the output (response) depends not only on the instanta-
neous input (stimulus) but also on the preceding inputs ( s t i m u l i ) . This means, howe-
ver, that the required stimuli must be remembered by the system in the form of va-
lues of some internal quantities. Let us term them memory quantities, and the aggre-
gate of t h e i r instantaneous values the internal state of the system.

The response of a deterministic system depends always uniquely on i t s internal


state and on the stimulus. For stochastic systems the response depends only in pro-
b a b i l i t y on both the input (stimulus) and the internal state.

A system may be modeled using d i f f e r e n t abstractions. For example using graphs,


deterministic or stochastic calculus, computer languages, etc. This raises the ques-
tion about the equivalency of models and abstractions. From the view point of beha-
v i o r , models are considered as equivalent when they produce similar sequences of
outputs for similar sequences of inputs. Equivalency of abstractions may be related
to the equivalence of t h e i r information measures (e.g. in Shannon's sense).

1.2 ENVIRONMENT.

Every system has i t s environment. With physical systems the environment is theo-
r e t i c a l l y everything that is not included in the given system. However, since we
confine ourselves mostly to a f i n i t e number of defined relations between the system
and i t s environment, i t is usually of advantage to r e s t r i c t oneself to the substan-
t i a l environment, i . e . to a limited set of elements which interest us in the environ-
ment. The same applies to abstract systems. The physical system and i t s environment
act on each other - they interact. The manner of which a system influences i t s envi-
ronment depends, in general, on the properties of the system i t s e l f as well as on
the manner of which the environment acts on the system. Conversely, the same applies
to the environment.

There is no "hard" boundary between the system and the environment. The environ-
ment is indeed another system which surrounds. The interaction process between the
system and the environment can only continue when the environment defined by i t s
behavior 2 and the system likewise form two abstract sets which neither include nor
exclude each other, The intersection of the two sets represents the boundary between
the system and the environment, see F i g . l . This represents the part of the system
which is relevant to the environment and reversely the part of the environment which
is relevant to the system : Relevance with regards to the environment's or the sys-
tem's purposes or goals or means of their realization in work. So the boundary is
representing the interaction context between the system and the environment. This
interaction is maintained by both interdependence of purpose - complementarity and
interaction which maintains separation and thus contradiction. Such tendancy both
towards c o n f l i c t and union are present in any real interacting process. The context
is being metabolized as the system and i t s environment work at changing each other
according to the dynamics of the interaction process and change into each other.

Fig.l System-Environment Interaction.

1.3 CONTROL.

We shall for purposes of s i m p l i c i t y define control sequentially as f i r s t making


a decision and then taking an action. Control is defined as aiming at a certain ob-
jective. Without objective there would be no decision : The word decision w i l l be
just meaningless. The objective can be considered as a function of a longer term
decision and action determined by another system or system level (an aspect of envi-
ronment). The decision and action of a system are d i r e c t l y related to the internal
state and output of the system, respectively.

Taking a decision is equivalent to enhancing the order or organization of the


system's state. I f the system transfers part of i t s organization to the environment
then a control process w i l l be taking place. This amounts to information transfer
between the system and the environment. Thus an action may be regarded as informa-
tion transfer (energy and material are used in this transfer but this aspect w i l l
not be discussed in the presentation). A necessary condition enabling information
to be received by the environment is that the system action be expressed by signals.
However, every environment is capable of d i r e c t l y receiving some types of signal
only ( i t is selective to the reception of signals), and i t can received these s i -
gnals only at given resolution level and during receptive intervals.

Information can be expressed by various signals and conversely, various meanings


can be attributed to a given signal. Signals can be more or less ambiguous. In order
to ensure that signals have meanings as information, there must exist a set of rules
according to which a certain informational meaning is expressed is assigned to i n d i -
vidual signals. This meaning determines the action or work that w i l l be performed
by using the information in relation to a purpose. A set of such rules determining
the information value of a set of signals is called a code. Hence the system action
to be interactive with the environment should be in terms of the environmental code.

1.4 LEARNING CONDITIONS.

Let us now consider the conditions to be satisfied by a system and i t s environ-


ment in order that a learning process can take place.

An obvious prerequisite for learning is that the system should have several
courses of actions open to i t . Since only a single course can be selected at any
given time, i t must be decided which of the possible courses is to be taken. This
is equivalent to ordering or organizing the d i f f e r e n t courses of actions according
to t h e i r preference in view of a certain goal linked to the environment response.
The more the disorder of those courses of actions the more the need of learning.
Entropy is defined as a measure of that disorder.

Let the set of actions be Y = { Y l ' Y2 . . . . . yn }. Define Pi as the probability of


deciding for the course of action Yi ( i = l . . . . . n). Note that

n
0 ~ Pi ~ I , s Pi = 1 (I)
i=l

The entropy H measuring the decision disorder is given by

n
H = -k z Pi gn Pi (2)
i=1

where k is some positive constant.

Then a prerequisite for learning is that the i n i t i a l entropy Ho be greater than


zero, i . e .

Ho > 0 (3)
Besides the necessity f o r i n i t i a l system's action disorder, i t is also important
f o r learning process to take place t h a t the system be s e n s i t i v e to the environment
response. A receptive code is necessary. Obviously, i f the system is i n s e n s i t i v e or
i n d i f f e r e n t to the environment's response there w i l l be no sense t a l k i n g about l e a r -
ning f o r the environment would have no influence whatsoever on the system.

Thus the system's s t r u c t u r e must be changing in accordance with the environment's


response and in such a way t h a t the system's entropy be decreasing with time passing,
i.e.

lim H ÷ 0 (4)
t-~

1.5 LEARNING AND ENTROPY.

To understand the i n t e r r e l a t i o n s h i p between learning and entropy l e t us c i t e the


following example. Suppose we give a m u l t i p l e choice exam to a student. Even though
he has to choose one of the a l t e r n a t i v e s , he i s , except in a special case, not I00
percent sure that his choice is the correct one. In general, his degree of knowledge
is b e t t e r represented by the p r o b a b i l i t y (or p l a u s i b i l i t y ) d i s t r i b u t i o n over the n
a l t e r n a t i v e s . I f his degree of learning is low, the p r o b a b i l i t y w i l l be d i s t r i b u t e d
more or less evenly over the a l t e r n a t i v e s . As he learns more, however, the probabi-
l i t y w i l l be more and more concentrated on fewer and fewer a l t e r n a t i v e s . I f one
defines the entropy H as in eqn. (2) where Pi is the p r o b a b i l i t y of the i - t h a l t e r -
n a t i v e , the process of learning w i l l be expressed by a decrease of entropy. I t is
true that the increase of confidence on a wrong a l t e r n a t i v e w i l l be expressed by a
decrease of entropy too, but we cannot deny t h a t one aspect of the process of correct
learning can be expressed by a decrease of the learning entropy. I f one s t a r t s with
a wrong b e l i e f and i f one gradually s h i f t s the weight from the wrong a l t e r n a t i v e to
the correct a l t e r n a t i v e , the entropy in the f i r s t stage w i l l increase, expressing
the unlearning of a wrong b e l i e f , and in the second stage w i l l decrease, expressing
the learning of a correct b e l i e f , see Fig. 2. So unlearning becomes necessary f o r a
successfuly succeeding phase of l e a r n i n g .

1.6 TYPES OF LEARNING SYSTEMS.

One important property of learning systems is t h e i r a b i l i t y to demonstrate an


improving performance i n s p i t e of lacking a p r i o r i information or even under condi-
tions of i n i t i a l indeterminacy.

Depending on the information i n p u t ot the system and the system - environment


interaction it is possible to d i s t i n g u i s h d i f f e r e n t types of learning processes :
unlearning -~_ learning

Ho

t
to

Fig.2. Evolution of learning system's entropy.

a. UnsupervisedLearnin~.
(Self-Leraning or Learning without Teacher).
This is the case when the system does not receive any outside information except
the ordinary signals from the environment. The system would then be learning by
experimenting behavior. Such learning systems are usually called self-organizing
systems, see Fig.3. The study of such systems finds its application for example, in
problems of simulation of behavior, and automatic clustering of input data.

Environment I

T System

Fig.3. Learning by self-organization or experimenting


behavior.
b. Supervised Learning.
(Training O~ Learning by a Teacher).

This is the case when the system receives additional information from the out-
side during the learning process. Here the teacher is the source of additional exter-
nal information input to the system. Depending on what information the teacher in-
puts to the system undergoing training i t is possible to distinguish two situations :

i . training by showing, the teacher inputs realizations of the output signal y cor-
responding to the given realizations of the input signal x to the system being t r a i -
ned, see Fig.4.

i i . training by assessment, the teacher observes the operation of the system being
trained and inputs to i t i t s appraisal z of the quality of i t s operation (in the
simplest case the teacher gives a reward z = +l or punishment z = - l ) , see Fig.4.
One may further classify learning by a teacher into two categories : learning
by an ideal (or perfect) teacher, and learning by a real teacher (or teacher who
makes mistakes).

Teacher

yor z

Y
Environment
I System

Fig.4. Learning by a Teacher, a)training by showing: the teacher inputs


to the system, b) training by assessment: the teacher inputs z
to the system.
But there remains an important and yet unexplored type of learning where the
i n t e r a c t i o n between the system and the teacher becomes active instead of passive.
That is when the teacher l i k e the system does not know the r i g h t action y for a
s i t u a t i o n x, the teacher and the system become both self-organizing systems learning
from each other and experimenting t h e i r behaviors. There w i l l in fact be no "speci-
f i c " teacher ; the teacher in t h i s case is j u s t another system. Either system is the
teacher f o r the other one. This case may be called cooperative learning in juxtapo-
s i t i o n with the t r a d i t i o n a l (competitive) learning by a teacher. ( I t is valuable
to note that that competitive learning requires cooperation in setting the rules
within which the strategy w i l l be developed, i f the student has no code f o r obser-
ving t h i s cooperative aspect then teacher and student w i l l have d i f f i c u l t y in unlear-
ning i n f e r i o r learning techniques).

1.7 MATHEMATICAL MODELING.

Learning may be mathematically modelled as a hierarchy of experience problems,


operating at d i f f e r e n t time-scales. The experience problems solved at the longer
time span structure the problems to be at the shorter time spans, and vice versa.
Experience problems of slow dynamics, corresponding to long time span, may be consi-
dered as q u a l i t a t i v e learning. On the other side, experience problems of f a s t dyna-
mics, corresponding to short time span, may be considered as q u a n t i t a t i v e learning.
In t h i s monograph we deal only with the q u a n t i t a t i v e aspect of learning. (An auto-
maton capable of q u a n t i t a t i v e learning can be regarded as the simplex of a s e l f - o r -
ganizing system).

The problem of learning thus considered may be viewed as the problem of estima-
tion or successive approximation of the unknown quantities of a functional which
is chosen by the designer or the learning system to represent the process under
study. The basic ideas of measuring the accuracy of approximations w i l l be related
to the problem of learning in an unknown environment, i . e . , where the function to
be learned (approximated, estimated) is khown only by i t s form over the observation
space. Any f u r t h e r specification of such a functional form can be performed only on
the basis of experiments which o f f e r the values of the approximated function in the
domain of i t s d e f i n i t i o n . This implies that any desired solution which needs the
knowledge of the approximated function is reached gradually by methods relying on
experimentation and observation.

1.8 CONCLUSIONS.

The behavior of a system depends on i t s state of learning or information level


which are measured by the organization of the system's output (action or response)
corresponding to each input (or stimulus received from the environment). The higher
the information level of the system the lower the entropy of the system. The optimal
rule of the system's behavior, or the optimal relationship between its input and out-
put, depends on the system's purpose, or goal. Learning is needed when a system does
not know a priori the optimal rule of behavior, i . e . the i n i t i a l entropy is greater
than zero. Only by experimenting behavior or undergoing training by a teacher would
the system then be able to learn about the optimal rule of behavior. Learning takes
time. Throughout that time the system processes information and adapts its struc-
ture. I f the system is learning successfuly then its entropy would decrease after a
sufficiently large interval of time. The higher the learning rate the sharper would
be the decrease in the system's entropy. The system might start learning with hel-
ding a wrong belief. So i t s entropy would be increasing instead of decreasing for
some interval where the system would be unlearning. Finally we pointed out different
types of leraning systems which can generally be classified as learning without tea-
cher or learning with a teacher. We further classified the latter case into learning
with ideal or real teacher. Learning with a teacher can further be classified as
competive or cooperative.

COMMENTS.

l . l Elaborate definitions on abstract systems and further details on general system


modelling can be found in K l i r and Valachl , and K l i r 2.

1.2 The comments on the system-environment-interaction cybernetic model are influen-


ced by Brodey3.

1.3 A good reference is K l i r and ValachI .

1.5 The example is quoted from Watanabe4.

1.6 The models of unsupervised learning are important in behavioral science. Some
models were introduced in the literature on behavioral psychology5 and lately
in engineering science6. A discussion on supervised learning, or training by
assessment and by showing is given in Pugachev7. Some discussions on learning
from a teacher who makes mistakes is given in Tebbe8.
10

REFERENCES.

I. G.J.Klir, and M.Valach, Cybernetic Modellin 9. London : l l i f f e books limited, 1956.


2. G.J.Klir, An approach to General Systems Theory. New York : Van Nostrad Reinhold,
1969.
3. W.Brodey, private discussions.
4. S.Watanabe, "Norbert Weiner and Cybernetical Concept of Time", IEEE Trans. on
Syst., Man, and Cybern., May 1975, pp. 372-375.
5. R.R.Bush, and F.Mosteller, S.tochastic Models for Learnin 9. Wiley, 1958.
6. K.S.Narendra, and M.A.L.Thathachar, "Learning Automt- A Survey", IEEE Trans.
Syst., Man, Cybern., vol. SMC-4, N°4, 1974,pp.323-334.
7. V.S.Pugachev, "Statistical Theory of Automatic Learning Systems", Izv. Akad.
Nauk SSSR Eng. Cybern., N°6, 1967, pp. 24-40.
8. D.L.Tebbe, "On Learning from a Teacher who makes Mistakes", International Joint
Conf. on Pattern Recognition Proceedings, Washington DC., Nov. 1973.
CHAPTER II

D E C I S I 0 N - Pattern RECOGNITION

C6sar : C'est peu d'avoir vaincu puisqu'il


faut vivre en doute.
Antoine : Mais s'en peut-il trouver un qui
ne vous redoute ?
J. Gr6vins : La Mort de C6sar.

2.1. PATTERNRECOGNITIONPROBLEM.

The problem of pattern recognition is concerned with the analysis and the deci-
sion rules governing the i d e n t i f i c a t i o n or c l a s s i f i c a t i o n of observed situations,
objects, or inputs in general.

For the purpose of recognition an input pattern ~ is characterized by a number


of features constituting the elements of a feature vector z. We shall for purpose
of s i m p l i c i t y , decompose the pattern recognition problem into two problems :

i. - ~YC~-~g~_~b~C~!~!gDl. Let the input pattern vector ~ l i e


in the pattern space ~x" The feature vector ~ lies in the feature space Rz and is
constructed by effecting certain measurements or transformations on the input pattern.

While a pattern v e c t o r ~ m i g h t be i n f i n i t e dimensional, the feature vector z as


a rule is f i n i t e dimensional and usually of less dimension than x. Hence the problem
of characterization consists in finding an appropriate transformation T that maps
the input pattern space into a feature space ~z such that z adequately characterizes
the original x for purposes of c l a s s i f i c a t i o n , i . e . i t provides enough information
for discriminating the various patterns.

ii. - ~ig~_~_@~!i~i~D_~!~ifi~ig~l.
The abstraction problem is concerned with the decision rules for labeling or
classifying feature measurements into pattern classes. The c l a s s i f i c a t i o n rules is
such that the features in each class share more or less common properties.

Due to the distorted and noisy nature of feature measurements each pattern class
could be characterized by certain s t a t i s t i c a l properties. Such properties may be
fully-known, partially-known, or completely missing a p r i o r i . In the case of lacking
a p r i o r i information i t is required that the c l a s s i f i e r undergoes training or to be
b u i l t according to learning theorems.
~2

2.2. FEATUREEXTRACTION.

The transformation T : ~x + ~z is usually characterized by a set of parameters


called pattern features. A suitable set of pattern features should somehowreflect
certain properties of the pattern classes.

The basic feature extraction problem can be classified into two general catego-
ries :

i - intraset feature extraction

i i - interset feature extraction.

Intraset feature extraction is concerned with those attributes which are common
to each pattern class.

Interest feature extraction is concerned with those attributes characterizing


the differences between or among pattern classes.

The intraset and interset features essentially pose conflicting extraction c r i -


t e r i a , For intraset features the interest is to keep the distance (as a measure of
d i s s i m a l i r i t y ) between the feature vectors belonging to the same class as close as
possible to the distance between the corresponding pattern vectors. Alternatively
stated the disorganization entropy between the sample vectors of the same class in
the feature space is to be kept as close as possible to i t s value in the pattern
space. This amounts to maximizin9 the entropy.

On the other hand for interset feature extraction the interest is to emphasize
the differences between the patterns. This can be attained i f some clustering of the
same pattern samples is attained in the feature space. This amounts to contracting
the distance between the same pattern samples in the feature space, thus enhancing
the organization or minimizin~ the entropy.

2.3. KARHUNEN - LOEVE EXPANSION.

Assume there are K pattern classes (K > 2) wl , w2. . . . wk. The pattern vector
is assumed to be N dimensional with probability density functions.

K
f(x_) : S ~k fk (-x) (1)
k:l

where ~k is the probability that a pattern belongs to class wk, fk(~) is the condi-
tional density of x for given wk. We assume without loss of generality that E(~) = O,
since a random vector with nonzero mean can be transformed into one with zero mean
by translation, which is a linear operation. Then the covariance matrix R is the
N x N matrix
13

K
R-- E {x Z
x } = k~=l ~k Ek { x x T} (2)

where Ek denotes the expectation over the pattern vectors of class wk. The Karhunen-
Loeve expansion is an expansion of the random vector x in terms of the eigenvectors
of R. Let ~j ans uj be the j-th eigenvalue and eigenvector of R i.e.

Ruj = ~j ~j (3)

Since R is always symmetric and positive semi-definite i t is easy to see that

~j ~ o (4)

u_jT _u~L= 0 if ~j # ~ (5)

I f R is further a f u l l - rank matrix then there exists a set of N orthonormal eigen-


vectors ~ l ' ~2 . . . . . ' ~N with eigenvalues ~I ~ ~2 ~ ' " ~ ~N ~ O.
The expansion in terms of eigenvectors,

N T
: j ~ l cj~j ~ cj =~ ~j (6)

is called the Karhunen - Loeve expansion. Note that cj is a random variable due to
the randomness of ~. Since we assume E(x) = O, E(cj) = O, and by (2), (3), and the
orthonormality of uj ,

9
In other words, the random variables cj and cc are uncorrelated i f j # ~, and E(c~)
equals the eigenvalue ~j. This property of zero correlation is an important and
unique property of the Kruhnen-Loeve expansion.

2.4. INTRASET FEATURE EXTRACTION.

The intraset feature extraction reflects the pattern properties common to the
same class. Intraset feature extraction may be studied from various points of view.
This extraction problem may be analyzed as an estimation problem, or considered as
a problem of maximizing the population entropy (as noted before).

2.4.1. Estimation Problem.


Assume that the N-dimensional pattern vector ~ belongs to a multivariate popu-
l a t i o n whose p r o b a b i l i t y density f(~) is gaussian with zero mean ( i , e . E(x) = O)
14

and N x N covariance matrix R.

Consider linear feature extraction, where the M - dimensional feature vector z


(M < N) is given by the linear transformation

z : Tx (8)
where T is the matrix

TT : (V_l, v 2. . . . . _YM) (9)

where {v_j} is a set of orthonormal basis in £x" Notice that the feature space ~z is
a subspace of £x whose basis is {V-I . . . . . _vM}. I f we expand ~ in terms of (V-j) we
have

N
x = jZl cj v j (lO)

with the c o e f f i c i e n t s

cj : T V - j (ll)

Note that c i is a random variable due to the randomness of ~. The feature vector £
v

becomes

zT xT TT N T
- :- : j:l cj V-j (Z1 . . . . ~M) = (c I . . . . c M)
(12)

The l a s t step is due to the orthonormality o f ~ j . Thus the feature v e c t o r ~ consists


of the f i r s t M c o e f f i c i e n t s .

The estimation problem consists in determining the matrix T, see eqn. (g) such
that the error between the pattern vector x in ~ and i t s projection z in the fea-
-- X

ture space ~z be minimum. Mathematically stated i t is required to determine the


basis vectors ~I . . . . . ~M such that the e r r o r norm :

N
EIJ~-~JJ 2 = E {(~_~)T(~_z)} : E{(j=~+I cjv-j)T(k=~+l Ck~k)} (13)

N N
= E z c~ : z E(c~)
j=M+I 3 j=M+I
be minimum.

I f one uses the Karhunen - Loeve expansion for the representation (I0) then i t f o l -
lows from eqn. (7) that the required vectors V-I . . . . . -YM are given by the eigenvec-
tors ~I . . . . . . ~M' see eqn,(3), corresponding to the M largest eigenvalues of the
covariance matrix R, see eqn. (2).
15

2.4.2. Entropy maximization.

Let ~x be the N - dimensional pattern space, and the feature space ~z be an M -


dimensional subspace of ~x" The relationship between~ and ~ may be expressed by
z = T x where T, an M x N matrix, is the linear feature extractor. The pattern vec-
tor x is distributed according to a continuous probability density function f(~).
Then the density function for ~, f(~), is a marginal density of f(~), and depends
also on T. We define two entropies,

H(~) : Ex -{Zn fx(X)} = - Y fx(X) Zn fx(~) d~


~x (14)
H(~) = Ez -{Zn fz(~)} = -Y fz(~) Zn fz(~) d~
~z
We wish to find for feature extraction a matrix T that reduces the dimensionality to
M and at the same time preserves as much information content as possible. This
amounts to finding the M - space ~z that preserves the maximum entropy compared with
other M - spaces. (Note that the entropy is a measure of the intraset dispersion).
Let f(x) be a Gaussian density with zero mean and covariance matrix R. The entropy
then becomes

H(x) = - E {Zn fx(X)} = E {~Zn 21T + ½Zn IR1 + ½ x_TR-lx} (15)

where IRI is the determinant of R. Noting that

E {xT R-l x} = E { t r R-l x xT} = t r I = N (16)

we obtain

H(x)_ : i~ ~n R + "~ ~n 2 ~ + ~2 (17)

Let z = T x and T be an M x N matrix with orthonormal row vectors. Since the


marginal density of a Gaussian distribution is Gaussian, we have

H(z) = ½ ~n IRzl + ~ Cn 2~ + M (18)

where Rz = T R TT (19)

is the covariance matrix of z. Since the determinant of a matrix is equal to the


product of i t s eigenvalues, (18) may be written as
16

H(z_) = ½ M
• ~n @j + ~ ~n 2~ + ~M (20)
j=l

with @j being the eigenvalues of the covariance matrix Rz. Hence we obtain the f o l -
lowing result,

Theorem

Let f ( x ) be a Gaussian density f u n c t i o n w i t h zero - mean and covariance matrix R.


The optimum M x N linear feature extractor that maximizes H (~) is

TT = ( ~ l ' ~2 . . . . . . ~M) (21)

where u l , u2 . . . . . ~ are the eigenvectors associated with the M largest eigenvalues


ll' 12' " ' " IM in the Karhunen - Loeve expansion. The maximum entropy is

H(z) = )~ ~n ~.j + £n 2~T +-~-


j=l

2,5, INTERSET FEATURE EXTRACTION.

So far we have discussed feature extraction without considering discrimination


between d i f f e r e n t classes. Since pattern recognition is concerned with c l a s s i f i c a -
tion of patterns, an obvious c r i t e r i o n for feature extraction is the error probabi-
l i t y . We would like to find an M - dimensional subspace of ~ such that the probabi-
x
l i t y of c l a s s i f i c a t i o n errors is minimum compared with other M - subspaces. Unfortu-
nately the error probability is generaly very d i f f i c u l t to calculate and i t is prac-
t i c a l l y impossible to use as a c r i t e r i o n for feature extraction.

Interset feature extraction is concerned with generating a set of features which


tend to emphasize the d i s s i m a l i r i t i e s between pattern classes. Kullback 21 has sug-
gested that divergent information or divergence can provide an appropriate measure
of the d i s s i m i l a r i t i e s between two populations.

2.5.1. The Divergence.

Consider two classes of pattern wI and w2 with probability density functions


f l ( x ) and f2(x). From s t a t i s t i c a l decision theory see sec 2.7., the c l a s s i f i c a t i o n
of a pattern x is based on the log. likelihood r a t i o ,

f2(~)
~n A(~) = ~ n ~ (23)
17

I f ~n A (~) is greater than a certain threshold value, ~ is classified as belongSng


to w2 ; otherwise to wI .
Therefore, we define

Jl(~) : EL{ Zn f l ( ~ ) } : f f l ( x ) Zn f l ( ~ ) dx
I
f2 (~) ~x f2(~ ) - (24)
J2(~) = E2{ Zn f2(~) } : $ f2(~) Zn ...
f2(~) dx
f l (~) Rx f l (~) -

where El{. }and E2{.)indicate the expectation over the densities f l ( ~ ) and f2(~)
respectively. Jl(X) may be interpreted as the average information for discrimination
in favor of w I against w2, and J2 (~) may be interpreted in a similar manner. The
divergence is defined as

J (~) : Jl (~) + J2 (~) (25)

and is therefore a measure of information for discrimination of the two classes.

The measure (25) stated for the two-class case can be converted to the K - class
case by optimizing the sum of a l l pairwise measures of quality or by maximizing the
minimum of pairwise measure of quality.

When ~ is replaced by z and the densities fx are replaced by the transformed


densities fz the criterion (25) can also measure the overlap of the class-conditional
probability densities of the transformed samples.

Fig.l i l l u s t r a t e s two possible one dimensional distributions of samples resulting


from two transformations applied to the same distribution in ~x" The transformation
which results in the least overlap of probability densities w i l l yield the space with
the least expected classification error with respect to the Bayes optimal decision
rule. The measure (25) is optimized when there is no overlap and take i t s worst value
when the class densities are identical (maximum overlap).

In defining (24) we have used the convention that f l ( x ) / f 2 ( x ) = 0 i f f l ( x ) = f2(~)


= 0 and 0 ~ ~ = O. I t is interesting to note that when the two classes are separable,
i . e . f2(x) = 0 i f f l ( x ) > 0 and vice versa, the patterns may be classified without
error and J(~) = ~. On the other hand, when f l ( ~ ) = f2(~) for almost a l l x, the two
classes are indistinguishable and J(~) = O.

2.5.2. Feature extraction.


Let us now discuss the application of the divergence criterion to the following
simple examples.
18

(a) _z : Tl(x_)

Z
r

(b) z_ = T2(x )

Z
r

Fig. I. Measuring overlap of the class probability densities :


(a) small overlap, easy separability, (b) large overlap, poor sepa-
rability.

Example (a).

Assume that f l ( ~ ) and f2(x) are both Gaussian, and

El(k) : ~ , E2(~) :m
(26)
E l (x x_T) : R1 , E2(x x_T) : R2
19

For a l i n e a r feature extractor T, the marginal densities, f l ( z ) and f2(~), are


Gaussian with

E1 (~) : £ E2 (~) : Ez
(27)
E1 (~_z T) = Rzl E2 (A_z T) = Rz2

where

~z = T _
m , Rzl = T Rl TT , Rz2 = T R2 TT (28)

We obtain by straightforward calculation the divergence measure

(29)

J (x) is s i m i l a r to J (~) in (29) with mz, Rzl and Rz2 substituted by m, R1 and R2 .
Let us consider two special cases.

(1) Equal covariance case. In this case

Rl = R2 = R , Rzl = Rz2 = Rz

and obviously

J (x) = _m
T R- l _m

I f we select a l x N l i n e a r feature extractor,

T : mJ R-I (30)

and substitute (28) and (30) into (29), we obtain

J(z) : m/ R- I m (m
. T R-. l R .R- I m)
. -I mT R-I m = J (x) (31)

The r e s u l t suggests that the other directions do not contribute to the discrimination
o f the two classes. In other words the optimum c l a s s i f i c a t i o n is based on the s t a t i s -
t i c mT R- l x.

b) Equal means. In this case the mean of the second-class is also zero, m = O. I f
both Rl and R2 are positive d e f i n i t e then there exists a real and non-singular N x N
20

m a t r i x U,

uT = (~I . . . . . UN) such t h a t

U Rl UT : A , U R2 UT = I (32)

where A is a diagonal matrix with real and p o s i t i v e elements X I ' X2 . . . . ' XN and I
is the i d e n t i t y m a t r i x . In f a c t , the row vectors o f U are the solutions of the equa-
tion,

R1 ~j = ~j R2uj (33)

I t is noted that (32) implies a weighted orthonormality condition

T T
uj R2~ j = 1 , uj R 2 u ~ = 0 j # ~ (34)

Since U is non s i n g u l a r and J (~) is i n v a r i a n t under non s i n g u l a r transformations,


we may use (32) to c a l c u l a t e J (z) with Rzl and Rz2 s u b s t i t u t e d by A and I .
Thus

N (~j + ~--j
J(z) = I/2 t r (A - I) (I - A-Itx = I/2 j~l l _ 2) (35)

Eqn. (35) indicates that we should choose a feature extractor

TT = (~I ' ~2 . . . . . . ~M) (36)

where u j is associated with ~j which is ordered according to

l l 1 (37)
~1 + ~ll > ~2 +~2 > " " ~ ~N +

The resulting value of J(~) is

M 1 . 2) (38)
J (z) : I/2 j~l (~J + X-'I"

Note that the row vectors of T are orthonormal in the sense of (34) instead of u ]
u& = O, which is a property of the optimal T considered in the previous sections.
21

2.6. OPTIMAL CLASSIFICATION.

Algorithms of optimal classification can be obtained i f there is enough a p r i o r i


s t a t i s t i c a l data about each pattern class. The analytical tool to obtain such classi-
f i c a t i o n rules is the statiscal decision theory.

Let the pattern classes be wl , w2. . . . . wK. For each pattern class wj, j=l . . . . K,
assume that the conditional multivariate (M - dimensional) probability density of the
feature vector z (M - dimensional), f j (z) as well as the probabilities ~j of occu-
rences of wj (i = l . . . . . K) are known.

The problem of classification consists in partitioning the feature space flz into

~
K subspaces Fl , F2. . . . . . . FK, see Fig.2, such that i f z ~ Fi we classify the pattern
to the class wi -

rl

Fig. 2. Partitioning of the feature space

In order to describe the concept of "best" p a r t i t i o n , we introduce the loss func-


tion

F(w i , y) , i : 1...... K (39)

where y is the classification decision, such that i f z ~ Fj then y = y j , j = l . . . . K.


Hence F (wi , y j ) denotes the loss to be incurred when a feature from the i th pattern
is classified into the j - t h class.

The conditional loss for z ~ wi is

r ( w i ' Y) : I F(w i , y) f i ( ~ )
flz dz (40)
22

For a given set of a priori probabilities ~ = (~l . . . . . . ~K)T, the average loss
is

K
R(~_, y) = iS__l ~i r(wi" y) (41)

substituting (40) into (41) we get

K
R(~, y) = ~z i=l F(wi' Y) fi (~) ~i dz
c ~ (42)

The problem is to find y~ {yl ~. . . . . . . . . . . yK} as a function of ~, such that the


average loss is minimized.
In the case of binary classification, i.e. K = 2 the average loss function given
by eqn. (42) can be rewritten as follows,

R (~, y) = : F(Wl' Y) fl (~) ~l + F (w2, y) f2(z)~ 2 dz (43)


~z

By definition of the decision variable y i t follows that

: F (wi , yj) f i (~) ~i dz=


~z
(44)
: F(wi' Yj) fi (~) ~i d~
F
z
( i , j = l , 2)

Let us define the conditional error probability of the f i r s t kind.

a = f f] (~) d~ (45)
r2

corresponding to classifying an observation from w1 into w2. Similarly, define the


conditiona! error probability of the second kind

B = r~ f2(~) d~ (46)

I f we consider the loss function

F (wI, Yl ) : 411 F (w2, Yl ) = 421


(47)
F (wI, y2) = 412 F (w2, y2) = 422
23

Vll< v12 v22 < v21

then the average loss (43) can be expressed as

R (~, y) : (Vll (I - ~) + v12 ~) ~I +


(48)
+ (~21 B + ~22 (I - B)) ~2

2.7. STATISTICAL DECISION ALGORITHMS.

2.7.1. Bayes Approach.


In such case the average loss is assumed to have the general form (42). Minimi-
zation of that function with respect to y is equivalent to minimizing

F (wI , y) fl (~) ~I + F (w2, y) f2 (~) 72 (49)

for any observation or feature vector z. Since the decision y takes only two values
Yl and Y2 say + l and - l , respectively, then minimization of (49) can be obtained
by simply comparing the corresponding values.

Vll f l (~) ~l + ~21 f2 (~) ~2 for Y=Yl


(5o)
v12 f2 (~) ~I + ~22 f2 (~) ~2 for Y=Y2

Hence to minimize (49) we conclude the decision rule,

y = +I i f Vll f l ( z ) ~I + v21 f2 (~) 72 <v12 fl (~) ~I +v22f2 (~) 72


(51)
y = -I i f Vll fl (~) ~I + v21 f2 (~) ~2 > v l 2 f l ( ~ ) 71 + v22 f2 (~) ~2

The decision rule (51), called Bayes rule, can be rewritten in the form,

y = + 1 if 2 (~) > h
(52)
y = - 1 if ~ (~) < h

where X(~) denotes the likehood ration

fl (~) (53)
x (~) : ~
f2 (~)
24

and h is the threshold

v21 - ~22 ~2
h - (54)
v12 - Vll ~I

The decision rule (52) can be implemented as shown in Fig. 3.

I +l
.... z _~"'

-l

Fig.3. Binary s t a t i s t i c a l classification.

The particular choice of the v's

V l l = v22 = 0 v12 = v21 = 1 (55)

leads to the c l a s s i f i e r minimizing the criterion

R = ~l + B~2 (s6)

which amounts to the total probability of error. The decision rule can then be expres-
sed by (52) with the threshold value

h = ~2 / ~l (57)

This rule is called the Siegert - Kotelnikov.


Another particular choice of the ~'s is

Vll: ~22 = 0 , v12 = 1 , v21 = ~ (58)

This corresponds to the mixed-decision rule where


25

R : ~I + uB~2 ' u > 0 (59)

and

h : ~ ~2 / Xl (60)

The constant H in (59) is indeed reminiscent to the Lagrange m u l t i p l i e r .


~2
Considering~ ~-~as Lagrange m u l t i p l i e r interprets criterion (59) as minimizing
the conditional error probability of the f i r s t kind a subject to a constant conditio-
nal error probability of the second kind 8, i . e .

B = Y f2(~) dz = A = const. (61)


rI

This actually corresponds to the Neyman-Pearson rule for which the threshold h is
given by (60) where H is determined by solving the i m p l i c i t relation,

~2
f2 (~) dk (~) = A , h =H s~ (62)

2.7.2. Min - Max Approach.

The Bayes approach necessitates knowledge of the a priori probabilities ~l and


~2" I f that knowledge is lacking or such probabilities change under different envi-
ronmental conditions then one possible approach is to optimize the worst case. The
min - max approach can be interpreted as a "game with nature" where "nature" choses
the a priori probabilities that maximize the average risk. Let us consider.

Ull = uZ2 = 0 (63)

The optimal Bayes rule is determined by minimizing the average loss,

R (~, y) = u12 ~l ~ + u21 (I - ~l) B (64)

where ~I is an estimate of the a priori probability ~I" Minimizing (64), in i t s


equivalent form (49) with respect to y yields the same rule as (52) with the thre-
shold

h = v2---~l(l - ~I) (65)


v l 2 TTI
26

The conditional error probabilities of the f i r s t - and second - kind, ~, B, res-


pectively (eqs. (45), (46)) w i l l be functions of the estimated a priori probability
~I" That i s , ~ : ~ (~I), B = B (~i). I f the actual value of ~I is ~ then the adop-
tion of the decision rule (52) with the threshold (65) w i l l yield the following
deviation between the actual average loss and i t s estimated optimal value,

AR (~i,~i °) : (~12 ~(~l ) - v21 B(~l))(~l °- ~l )

The approach of min - max consists of choosing ~l which minimizes the maximum devia-
tion. Hence ~l is determined by the condition,

Vl2 ~(RI ) - ~21 B(~I ) = 0 (66)

Equations (65) and (66) completely specify the min-max rule computational d i f -
f i c u l t i e s however can arise when solving equation (65).

2.7.3. Maximum A posteriori Probability Rule.

In the case of complete a priori information an i n t u i t i v e l y appealing decision


rule is the maximum a posteriori.

According to Bayes formula, a posteriori probabilities that an observed situa-


tion z belongs to classes wI or w2 are equal respectively to

Pr (wI / z) : f l (~) ~l / f (~) (67)

and

Pr (w 2 / z) : f2 (~) 72 / f (&) (68)

where

f (~) = f l (~) ~l + f2 (~) x2 (69)

is the mixture probability density.

The observation ~ is classified into r I or r 2 depending on whether the a poste-


r i o r i probability with respect to wI is greater than that with respect to w2, or
vice versa, respectively. According to eqs. (67) and (68) i t follows immediately the
decision rule (52) with the threshold.

h = ~2 / ~I (70)
2?

In the case o f equal a p r i o r i probabilities, i.e.

~I : ~2 : I/2 (71)

the decision rule is called the maximum likehood.

2.8. SEQUENTIAL METHODS.

The algorithms presented so f a r are based on a f i x e d size M o f f e a t u r e measure-


ments, or dimension o f f e a t u r e v e c t o r ,

I f the cost o f taking f e a t u r e measurements is to be considered or i f the f e a t u r e


e x t r a c t e d from i n p u t patterns are sequential i n nature then sequential c l a s s i f i c a t i o n
methods are to be used.

In sequential methods f e a t u r e measurements are processed s e q u e n t i a l l y on succes-


sive steps. At each step a d e c i s i o n is made e i t h e r to f u r t h e r e x t r a c t f e a t u r e s or to
terminate the sequential process ( i . e . make the c l a s s i f i c a t i o n d e c i s i o n ) . The c o n t i -
nuation or t e r m i n a t i o n o f the sequential process depends on a t r a d e - o f f between the
e r r o r ( m i s r e c o g n i t i o n ) and the number o f features to be measured. The process i s t e r -
minated when a s u f f i c i e n t or d e s i r a b l e accuracy of c l a s s i f i c a t i o n has been achieved,

2 . 8 . 1 . Wald's Sequential P r o b a b i l i t y Ratio Test (SPRT).

Suppose t h a t a random v a r i a b l e z has the c o n d i t i o n a l p r o b a b i l i t y d e n s i t y func-


tions fi(~) f o r the p a t t e r n classes wi , i = I , 2.

The problem is to t e s t the hypothesis Hl : z ~ wI against the hypothesis


H2 : z ~ w2.The t e s t constructed decides in f a v o r o f e i t h e r wI or w2 on the basis
o f observations z I , z 2 , . . . The components z I , z 2, . . . o f the vector z are assumed
to be independent and i d e n t i c a l l y d i s t r i b u t e d random v a r i a b l e s . Suppose t h a t i f wI
is true we wish to decide f o r wI with p r o b a b i l i t y a t l e a s t (I - ~) w h i l e i f w2 is
true we wish to decide f o r w2 w i t h p r o b a b i l i t y a t l e a s t ( l - B).

Let us introduce the v a r i a b l e s

fl (zi)
~i = ~n ~ , i : I, 2.... (72)

so t h a t the l i k e l i h o o d r a t i o ~n' corresponding to n o b s e r v a t i o n s , can be w r i t t e n


thus

n fl (zi) fl (z)
: : f2-TI!T (73)
28

Wald's SPRT procedure is as follows : Continue taking observations as long as

B < ~n < A (74)

Stop taking observations and decide to accept the hypothesis Hl as soon as

~n ~ A (75)

and stop taking observations and decide to accept the hypothesis H2 as soon as

Xn ~ B (76)

The constants A and B are called the upper and lower stopping boundaries respecti-
v e l y . They can be chosen to obtain approximately the p r o b a b i l i t i e s o f error ~ and B
prescribed.

Suppose that at the n-th stage of the measuring process i t is found that

kn = A (77)

leading to the terminal decision of accepting HI . From (73) and (77),

fl (~) : A f2 (~) (78)

which is equivalent to

f f l (~) dz : A/ f2 (~) dz (79)


Fl - F1 --

by the d e f i n i t i o n s of e and ~, (79) reduces to

(I - ~) : AB (80)

S i m i l a r l y , when

~n = B (81)

then

= B (I - B) (82)
2g

Solving (80) and (82), we obtain

A = (I - a) / B (83)

B : ~/ (l - B) (84)

I t is noted that the choice o f stopping boundaries A and B results in error probabi-
l i t i e s ~ and B i f continuous observations are made and the exact equality (77) and
(81) can be respected.

From (77) and (81), again by neglecting the excess over the boundaries, we have

Ln = £n ~n = £n A with p r o b a b i l i t y B when H2 is true

Ln = £n B with p r o b a b i l i t y (1 - B) when H2 is true

Ln = ~n A with p r o b a b i l i t y (l - ~) when Hl is true

Ln = Cn B with p r o b a b i l i t y ~ when HI is true

Let Ei (Ln) be the conditional expectation of Ln when Hi is true, then i t follows


d i r e c t l y that

El (Ln) = (l - ~) £n A + ~£n B (85)

E2 (Ln) = B~n A + ( I - 8) ~n B (86)

Defi ne
ni = l i f no decision is made up to the (i - l ) the stage
= 0 i f a decision is made at an e a r l i e r stage

Then n i is c l e a r l y a function of zl , z 2 . . . . ~ zi_ l only and is independent of z i and


hence independent o f ~i = ~i ( z i ) "

N
Ln = i~l ~i = ~l nl + ~2 n2 + . . . . + ~n nn (87)

Taking expectations
co

E (Ln) = (iZ=l ~i r}i)

=E(~) ~ E(ni)
i=l

= E (~) iZ=l Pr (n .> i ) (88)


= E (~) E (n)
30

Therefore, from (85) the average numberof observations when Hl is true can be
expressed as

E}W)(n) = (l - ~) kn A + ~ kn B
El (~),, • (89)

S i m i l a r l y , from (86),

E~W)(n) _ ~3 ~n A + (I
. . . . .
Xn B (go)

The s u p e r s c r i p t (w) is used to designate Wald's t e s t .

2 . 8 . 2 . F i n i t e Automata.

As mentioned i n s e c t i o n 2 . 8 . 1 . the e q u a l i t i e s (83) and (84) hold only i f conti-


nuous observations are made so t h a t the exact e q u a l i t y o f (77) and (81) can be ob-
tained.

A d i s c r e t e from o f Wald's t e s t n a t u r a l l y needs a longer t e r m i n a t i o n time but is


simpler to r e a l i z e . A device f o r i t s r e a l i z a t i o n may be considered as a f i n i t e auto-
maton with l i n e a r t a c t i c , which may be described as f o l l o w s .

The automaton has (s + I ) states numbered from 0 to s ( F i g . 4 ) and is c h a r a c t e r i -


zed w i t h the f o l l o w i n g system o f t r a n s i t i o n : if the automaton is in the s t a t e j
and as a r e s u l t o f an experiment ~ > a is o b t a i n e d , where

fl (~)
: ~n ~ (gl)

and "a" is some t h r e s h o l d , then the automaton passes to the s t a t e ( j + I ) ; if


< b, then the automaton passes to the s t a t e ( j - I ) ; i f b < ~ < a then the automa-
ton remains in s t a t e j . Motion begins from the s t a t e w i t h index i . The states 0 and
s are terminal : a t t a i n e m e n t by the automaton o f s t a t e 0 leads to output o f a d e c i -
sion in f a v o r o f hypothesis H2 : ~ w2 ; attainement o f the s t a t e with index s, to
a d e c i s i o n in f a v o r o f hypothesis HI : z ~ wI .

We s h a l l consider the symmetric case, where the thresholds a and b are chosen
so t h a t P (~ > a I HI) = P {~ < b I H2) = p, P(~ > al H2) = P(~ < bl HI ) = q and
r = l - p - q = P(b < ~ < a).

Let us define
i Probability of the automaton attaining
Lj A the state s i f i t begins its motion (92)
from the state j
31

state
S r1
P

s-1 q )~r
r

q P P q
~pr i ~ r
q P

r 2
q

0 r2 / q
ii
" H^ is true " H1 is true "
L

Fig. 4. Automatonwith linear t a c t i c .


32

I f the hypothesis H1 is true, then for Lj we obtain the following f i n i t e - d i f -


ference equation :

Lj = p Lj+ 1 + r Lj + q Lj_ 1 (93)

with the boundary conditions Ls = l , Lo = O.

The solution of eqn. (93) is given by

1 - ~-J
(94)
Lj l - ~-s

where
X = p/q > l (95)

Since the conditional error probability of f i r s t kind ~ is merely l - Li , we have

xs-i _ l
= - - (96)
Xs - I

I f the hypothesis H2 is true, then we obtain for Lj with the same boundary con-
d i t i o n s , the equation

Lj = q Lj+ 1 + r Lj + p Lj_ 1 (97)

whose solution has the form

xJ -l (98)
Lj = ~s -l

Since the conditional error probability of second kind is merely Li , we have

Xi -I
- (99)
Xs -1

Hence, i f the error probabilities ~ and B are f i x e d , the parameters s and i of the
automaton w i l l be given by

i = ~n ( ] - ~ ) / ~n s-i = ~n ( ~ ) / ~n X (100)

Let Tj denote the mean number of t r i a l s from the start of the experiment and i t s
end, i f the automaton begins i t s motion from the state j .
33

I f the hypothesis Hl is true, then for Tj we obtain the f i n i t e difference equa-


tion

Tj = p Tj+ 1 + r Tj + q Tj_ 1 + 1 (lOl)

with boundary conditions TO = Ts = O. Ths solution of this equation has the form

- 1 [j ( 1 - ~-j -I) + (s-j) 1 - ~-J ] (102)


Tj P- q 1 ~-s 1 ~-s

Since, by hypothesis, the automaton begins i t s motion from the state i , then,
taking into account eqs. (96), (lO0), we obtain

T~I): [(I-~)In (-~)+ ~ /n (~)] / (p-q) /n ~ (103)

I f the hypothesis H2 is true, then for Tj the equation takes the form

Tj = q Tj_ l + r Tj + p Tj+ l + 1 (104)

with the same boundary conditions. Hence

T~2): [ B I n (]-~)+ (I-B)In (~_-T~-)] / (p-q)Zn x (105)

2.9. SUPERVISED BAYES LEARNING.

The probabilities ~I . . . . . . . ~K that an observation belongs to class wi ,


i = 1. . . . . K, are assumed known. The conditional probability densities
f i (~ I ~i ) ~ f (~ I ~ i ' wi) of an observation z, assuming i t to come from wi , are
assumed to have known functional forms, depending on parameter vectors ~i some, or
all of which are unknown. The problem is as follows. A sequence of generally vector-
valued observations, ~ l ' . . . . Z-n. . . . are received, one at a time, and each is
classified by a teacher as coming from one of a known number K of exclusive classes
wI , . . . . , wK•
The problem is to learn the unknown parameters ~ i ' i = l . . . . , K, so that after
an adequate training one can apply s t a t i s t i c a l decision algorithms ; sec. 2.7, for
classification of new unclassified observations.

The Bayesian algorithm for learning about@i involves the specification of an


a priori density Pio(@i ) for ~ i ' and the subsequent recursive computation of the
38

f(~ I-~ ) P( ~ ~l .....~ )


P( -~ ~l ..... ~n ) =
P(~ /~-~..... ~ - I )
f(~ / ~- ) P( -~/-~ ..... ~n ) (i13)

f(~nI ~) P(~ Iz I ..... ~.l)d

Substituting f(Zn / ~) from ( I l l ) we get

x i f i ( E i / ~ i ) P( ~JEl . . . . . z_~)
PC ~ I~1' '-~n ) i=l (i14)
- "'" = l i ~Kl ~ifi(zi/e_i)p ( -~/ El ,.--,Z_n_1)d

The relationship between ~ and ~i is defined by (llO) or ( l l 2 ) . I t is obvious that


due to the mixture form inherent in (If4) there exists no reproducing densities for
unsupervised Bayes learning. This indicates clearly the complexity of unsupervised
learning compared with supervised learning.

2.11. IDENTIFIABILITY OF FINITE MIXTURES.

In the mixture assumption for unsupervised learning, the densities f i (~ / ~i )


usually belong to the same family of functions F. For example, consider the family
of one-dimensional Gaussian densities with mean-value m and non-zero variance r ;
F = {g(z, m, r), r > 0}. A Gaussian mixture may be written as
K
f (z/~) = i~l fi (z / Oi ) ~i ' fi(z/Oi)E F (llS)

where

fi (z /_Oi} = g(z ; m , r i ) ,

_0i = (mi , r i )
T
0 = (_e . . . . . 9_~, Pl . . . . . PK)

A major theoretical question is whether (115) is a unique representation of


f(z/~).
--
In other words we ask whether there exist -O!
-I'
' and K' such that K and K'
~i'
are f i n i t e and
87

K K'
f(z/~) = i~l fi (z/-Oi) ~i : iS__l fi (z/O_~) ITi' (116)

A t r i v i a l cause for the lack of uniqueness is that by permutation the individual


terms in ( l l 5 ) may be labelled in K ! ways. This d i f f i c u l t y may be resolved by esta-
blishing an ordering ~cin F and arranging the terms in (115) in such a way that
f l ( z / ~ l ) < f2(z / ~2)-< . . . . . For the family of Gaussian densities, we may define
an ordering g(z ; mj, rj)~< g(z ; mk, rk) i f r j > r k or i f r j = r k and mj < mk.
Note that, defined in this way, any subset of the Gaussian family has a unique
ordering. Consider an arbitraty family F. We assume that an ordering has been defined
and the densities f i (z / ~i ) in a mixture are arranged in this order.

Under this assumption, the class of a l l f i n i t e mixture of F is said to be iden-


t i f i a b l e i f (116) implies ~i =-I'0~ Pi = Pi' ," and K = K'.
The concept of i d e n t i f i a b i l i t y was introduced by Teicher 7' 8 . Its importance to
non-supervised learning is f a i r l y obvious, since the problem is defined in terms
of f i n i t e mixtures and i d e n t i f i a b i l i t y simply means that a unique solution to the
problem is possible.

2.12. PROBABILISTIC ITERATIVE METHODS - SUPERVISED LEARNING.

To formulate the problem of training an automatic recognition system mathemati-


cally, i t is necessary to specify a class of possible decision functions for the
system and a certain goal of learning. The goal of learning may be to attain after
training the best approximation (in a certain sense) from the class of possible
decision functions to a specific optimal rule of classification. In the sequel we
consider the problem of classification between two classes, i . e . K = 2.

As a class of possible decision functions l e t us consider the functions

N
= ~ ci ¢i (z) (117)
i=l

where ci are unknown parameters and @i' i = l , - - , N, are a set of orthonormal


functions, i . e .

f @i(~) @j(~) d~ = a i j , i , j : 1, - - , N (liB)


Rz

Here 8ij denotes the kronecker deltas.

Using the vector notations


38

T
c = (c I , --, c N)
(ll9)
~T(~) : (@l(z) ' - - , @N(~))

eqns. (117), (118) can be rewritten as

y= c 3 ~ ( z ) (120)

and

f ~(~) ~T(~) dE = I (121)


~z

In two-classes (wI and w2) pattern recognition problem, the output y takes on either
the value + l or - l , such that y = + l , - . z ~ F l , y= -I . - ~ E r 2. This means that
= + l corresponds to classifying z in wI and y = -l to ~ in w2.
Let us now consider the learning of different decision rules.

2.12.1. Learnin9 Bayes rule.

I t follows from eqn. (51) that the optimal discriminant function is given by :

g(~) = (~12 - ~II ) ~IFl (~) + (~22 - ~21 ) ~2F2(~) (122)

such that :

r I : {~ : g(~) > O} , F2 = {~ : g(~) < 0} (123)

The goal of learning can thus be stated as to minimize some convex function of
the error between the optimal discriminant function (122) and i t s approximation
(120). Let us consider the quadratic error function,

J(~) : ~z (g(~) - -cT ~(z))2- dz_ (124)

The condition of the minimum of (124) has the form

v J(~) : -2 s (g(~) - c T ~ ( ~ ) ) ~(~) dz = 0 (125)


9z
39

Taking the orthonormality condition (121) into consideration eqn. (125) can be re-
written thus

- ~ g(~) ~(~) dz : 0 (126)

Substituting by g(z) from (122) we get the regression equation

Ez {~(~)} =0 (127)

where

- (v12 - ~11 ) ~(~) , ~ c F1


~(~) = (128)

- (~22 - v21 ) ~(~) ' ~ ~ F2

Applying p r o b a b i l i s t i c i t e r a t i v e methods the following algorithm is obtained


for the current estimate of the solution of the regression eqn. (127),

~(n) = ~ ( n - l ) - n -I [ ~ ( n - l ) - (v12 - V l l ) ~ ( ~ ( n ) ) ] i f y(n)=+l


(129)
~(n) = ~ ( n - l ) - n- l [ ~ ( n - l ) - (v22 - ~21 ) ~ ( ~ ( n ) ) ] i f y(n)=-l

The block diagram of learning system that realizes this algorithm is shown in
Fig. 5.

2.12.2. Learnin 9 Seigert-Kotelnikov (also max. a poster i o r i ) . Rule.


The learning algorithm can be obtained as a p a r t i c u l a r case of (129) when

~11 = v22 = 0 vi2 = v21 = 1 (130)

This yields the learning algorithm

c(n) = ~ ( n - l ) - n-I [ ~ ( n - l ) - ~(z ( n ) ) ] , i f y(n) = +l


(131)
~(n) = ~ ( n - l ) - n-I [ ~ ( n - l ) + ~(z ( n ) ) ] , i f y(n) = - l

Or what amounts to the same,

~(n) = c ( n - l ) - n- l [ ~ ( n - l ) - y(n) ~ ( z ( n ) ) ] (132)

The block, diagram of learning system that realizes this algorithm is shown in Fig.6.
40

2.12.3. Learnin 9 Mixed Decision Rule.


In this case

~)II = ~22 : 0 ; "~12 = 1 , u21 =


(133)

This y i e l d s the learning algorithm,

~(n) = ~ ( n - l ) - n-I [ ~ ( n - l ) - ~(~(n))] , if y(n) = +I


(134)
£(n) = E(n-l) - n-I [ £ ( n - l ) + ~@_(E(n))] , if y(n) = -I

The corresponding block diagram is shown in Fig.7.

~(n-l)

v12- ~II
+

v22- ~21

Fig.5. Supervised learning of Bayes Rule


41

~ )

c(n-1
~(.)

F i g . 6 - Supervised Learning o f S i e g e r t - K o t e l n i k o v r u l e

y(n)

F i g . 7 - Supervised Learning o f mixed d e c i s i o n r u l e

~ y(n)

F i g . 8 - Supervised Learning o f ~ f o r Neyman - Pearson r u l e


42

2.12.4. Learning Neyman - Pearson Rule.


In this case the adaptive mechanism of computing c (n) is the same as (134)
but remains to compute the lagrange m u l t i p l i e r u. The m u l t i p l i e r is to be adjusted
such that after training the constraint,

{I f2(~) d~ = A = const. (135)

is respected. Eqn. (135) may be rewritten as

f (I - A) f 2 ( ~ ) dz - f A f 2 ( ~ ) d~ = 0 (136)
rl - ?2

which amounts to the regression equation

Ez {O(~) / w2} : 0 (137)

where

(I - A) / ~T2 , i f z ~ w 2 , C__
T~_(z) > 0
e(z) = (138)
- A / IT2 , i f Z~,W 2 , C/~__(Z) < 0

I t is clear from (138) that the function @(~) depends i m p l i c i t l y on ~ through i t s


e x p l i c i t dependence on c. A discrete learning algorithm to update ~ such that the
regression eqn. (137) is satisfied may be written as

~(n) : ~ ( n - l ) - n-l(l-A) / ~2 , if y(n) = -I , cT~(~) > 0


{139)

~(n) = ~(n-l) + n-I A / z2 , i f y(n) = -1 , c_T~(z) < 0

The block diagram of learning system that realizes the algorithms (139) is shown in
Fig.8.

2.13. PROBABILISTIC ITERATIVEMETHODS - UNSUPERVISEDLEARNING.

Now consider the case when the teacher does not give the correct c l a s s i f i c a t i o n
y of the observed situations. This corresponds to learning without supervision or
to self-learning.

Let the goal of the self-learning system is to learn the Siegert-Kotelnikov maxi-
43

mum a posteriori probability decision rule, where the discriminant function is

g(z_) : ~2 f2 (z) - ~I fl (z) (14o)

Let us assume now that the products of a priori probabilities and conditional den-
sity functions ~l fl (~) and ~2 f2 (~) can be approximated by a f i n i t e series

~2 f2 (~) = a/~_(~) , ~l fl (~) : b3~(~) (141)

Here, _aT = (a I, --, aNl ) ,• T


b = (b l , -- , bN2) are unknown vectors, and

@T(z ) = ( ~ ( ~ ) , --, CNI (~)) , ~T(~) = (~i(~) ' - - , ~N2(~) )

are known vectors functions. For simplicity, their component functions are assumed
to form an orthonormal system.
The decision rule (140) can then be written in the form

~(~, a, b) : a T ~ ( ~ ) - b T ~ ( z ) (142)

and the decision rule is determined b~ finding the vectors a and b. But these vec-
tors can be found in the following manner. Noticing that due to (141) the probability
density function

f(~) = ~l fl (~) + ~2 f2 (~) (143)

is approximately equal to

f(E) = a/~(E) + bT ~(~) (144)

i t is simple to understand that the problem of determining the vectors a and b is


reduced to the restoration (estimation) of the mixture probability density function.
Let us introduce the functional

J(a, b) = .r ( f ( z ) - a T @(z) - b T ~(z__)) 2 dz (145)

By differentiating this functional with respect to a and b, and considering the


orthonormality of the component functions ¢(~) and ~(~), we find the conditions of
the minimum in the form
44

v a J(~, b) : E{~(~)} - ~ - Gb : 0
(146)
Vb J(a, b) = E{v(z)) - GTa - b = 0

where the matrix

G= f ~(~) ~T(~) d~ (147)


Z

By solving eqs. (146) with respect to a and b, we obtain

: E { U ( ~ ( ~ ) - G ~(~)) } , (148)

: E { U ( ~ ( ~ ) - GT~(~)) } , (149)

where U = (I - GGT) - I .

The simplest optimal stochastic approximation algorithms for solving the regres-
sion equations (148) and (149) are,

a(n) : a ( n - l ) - n" l (a ( n - l ) - U(~(~(n)) - G~(~(n))))


(150)
b(n) : b ( n - l ) - n -I (b ( n - l ) - U ~ (~(n)) - GT ~ ( z ( n ) ) ) )

The learned decision rule w i l l have the form

~(z(n) , a (n-l) , b (n-l)) =aT(n-l) ~ (z(n)) - bT(n-1) ~ ( z ( n ) ) (151)

The bIock diagram of the s e l f - l e a r n i n g system that uses these algorithms is shown
in Fig.9.
r-

z(n)

Qb,
(/I

Fig.9. Non-supervised learning - Parametric expansion.


46

2.14. SELF LEARNINGWITH UNKNOWNNUMBEROF PATTERNCLASSES.

In the algorithms of self-learning given above, i t was assumed that the number
of regions K into which the observed situations have to be clustered is given in
advance (.fo.r simplicity and c l a r i t y , i t was assumed to equal 2), Although this does
not look l i k e a s i g n i f i c a n t l i m i t a t i o n , since for K > 2 we can repeatedly use the
binary case (frequently called "dichotomy"), i t is s t i l l needed to remove the neces-
s i t y of specifying a fixed number of regions. In other words, i t is desired not
only to relate observed situations to proper regions but also to determine the cor-
rect nu~er of these regions.

S u f f i c i e n t l y complete information about the regions of the situations z is


contained in the mixture probability density function

K
f(z) = k~l~k fk(~) (152)

We can assume that the peaks of the estimated mixture probability density function
correspond to the "centers" of the regions, and the lines passing along the valleys
of i t s r e l i e f are the boundaries of the regions ; the number of existing peaks in
f(~) defines the number of regions, see Fig.lO,

z3

z2

zI

Fig.lO Density function corresponding to three classes.


47

In order to restore (estimate) the mixture probability density function f(z_) we


shall approximate i t by

f(z) : a/ @_(z) (153)

where _@(z) is a vector function with orthonormal components.

We now form the functional

J(a) = f ( f ( z ) - aJ ¢(z)) 2 dz (154)


~z

for which the necessary condition for optimality leads to the regression equation

a : E {~(z)} (155)

A p r o b a b i l i s t i c i t e r a t i v e algorithm for solving (155) may be written as

_a(n) : a_(n-l) - n-l (a(n-l) - _¢(z(n))) (156)

The algorithm (156) is an optimal one9

According to (153),

fn (z(n)) : a/(n)~(z__(n)) (157)

The system realizing algorithms (156) and (157) is presented in F i g , l l ,

F~?~z(n))
r-
z_(n) + + a(n-l)

T
F i g . l l Learning the mixed density function.
48

Therefore, we can form an estimate of the mixture probability density function.

A s l i g h t l y different approach to restoration (estimation) of f(z) is also possi-


ble. I t may be practical to define the mixture probability density function using
the estimator proposed by Rosenblatt 20 ,

fn (z) : ~-nTl m=~l I ( - I , I) ( (158)

Where IA( . ) is the characteristic function

IA(~) : l if ~ ~ A (159)
=0 otherwise

Rosenblatt 20 demonstrated the convergence (in mean square) of fn(z-) towards f(z-) on
the condition that h is a function of n such that h n ÷ 0 as n ~ ~ with hn converging
to zero slower than 1
n
We note that choosing the I function in (158) yields a contribution of the "needle-
type" following each o b s e r v a t i o n .

I t may be preferred to replace the I - function,eqn. (158),by a certain bell-


shaped function n(z-, z-(m)) (Fig.12). That gives the largest weight to the observed
situation ~(m), and for the other situation, the weights are different from zero.
This yields smooth estimates of the density function. Then instead of (158) we
obtain

fn(Z) : n- 1 nZ n(Z, z(m)) (160)


m=1

or, in the recursive form,

fn(z-) = fn_l(z-) - n -I (fn-l(~) - n(~, Z- ( n ) ) ) (161)

This algorithm of learning, like the algorithm of learning (156) and (157), can be
used in the estimation of the mixture probability density function, and thus also
in finding the number of regions or classes and their corresponding situations.

The algorithm of self-learning (161) can be generalized i f we replace a fixed func-


tion ~(z, ~(n)) by a function nn(z-, z-(n)) that varies at each step, for instance,

nn (z, z(n)) = (h(n)) -I n( ) (162)


49

I z3

zl/j
~,z2

Fig. 12 Bell-Shaped function.

where h(n) is a certain decreasing sequence of positive numbers. Eqn. (162) has the
meaning that the distributions get "sharpened" around the centers as n increases.
So their effets become secondary ; they merely contribute "needle" changes (corres-
ponding to 6 - function) as n + ~.

I t should be noticed that the algorithms of learning (156) and (157) are the
special cases of the lagorithm of learning (161). Actually, by setting

n(~, ~(n)) = @_T(~)~(~(n)) (163)

in (161), and by introducing fk(z) from (157), we obtain the algorithm of learning
(156) after a division by ~(z).

We have described above the way toward the restoration (estimation) of the mix-
ture probability density functions. For multidimensional vectors of the situation ~,
this restoration is very d i f f i c u l t when smoothness has to be maintained. I t is even
more d i f f i c u l t to extract the desired regions.

2.15. APPLICATION- MEASUREMENT STRATEGY FOR SYSTEMS IDENTIFICATION.

In some pratical situations, i t is required to identify a dynamic system under


the following limitations :
50

i - fixed measurement i n t e r v a l ,
i i - constrained set of admissible measurement structures, where the measurement
system has a variable structure, namely the number and spatial configuration of the
sensors can be altered 14. I t is assumed that the set of admissible measurement struc-
tures is f i n i t e , and the system is i d e n t i f i a b l e within that set.

The l i m i t a t i o n on the measurement interval suggests to search for the optimal


measurement structure by trading o f f the i d e n t i f i c a t i o n accuracy and the measurement
cost. That optimal structure may be reached by properly altering the measurement
structure at each time step depending on the current level of uncertainties (or
covariance errors). The algorithm for such alteration is called the system measu-
rement strategy.

The determination of such strategy amounts to solving the feedback optimization


problem. I t is generally d i f f i c u l t to find a closed - form solution for such problem.
On the other hand, a numerical solution is hindered by the so-called "curse of
optimality". Even for linear dynamic systems, the open-loop numerical solution-
corresponding to a particular i n i t i a l uncertainty - is so a formidable task that i t s
on-line implementation is p r a c t i c a l l y impossible 15.

In order to deal with such d i f f i c u l t y , we propose here the application of pat-


tern recognition techniques for the determination of these optimal measurement stra-
tegies. The form of the present solution admits on-line application, and may be
equally applied to non-linear systems.

2.15.1. Problem formulation.

Let the system dynamics be described by the following discrete-time recurrence


equation :

~(n+l) : fn(~(n)) + w(n) (164)

where f n ( . ~ s a known non linear vector function ; ~(n) and w(n) denote, respecti-
vely, the state and disturbance vectors, at the time step n = O, l , . . . . N-l. The
vectors f_~, ~ and w are a l l p-dimensional.

The disturbance sequence {w(n)} is assumed to be constituted of independent


stationary random variables of zero mean and known covariance,

E{w(n) wT(n)} : Vw(n) (165)

The system i n i t i a l state is known only within certain a p r i o r i s t a t i s t i c s assu-


med to be Gaussian with the following mean and covariance
$I

E{x(0)} : ~ ( 0 ) , E{(~(0) - R(0))(~(o) - R(O) T} : VX(0) (166)

I t is supposed that the system measurements at time n are represented by the


r-dimensional ( r ~ p) vector given by

~(n) = ~ n ( ~ ( n ) , ~(n)) + zCn) (167)

where v is the measurement noise. The sequence {v(n)} is assumed to be independent


of {w(n)} and also constituted of independent stationary random variables of zero
mean and known convariance

E {~(n) / ( n ) } = Vv(n) (168)

The vector c(n) specifies the measurement structure, which characterizes the
relationship between the system state-parameter vector and the measurement vector
at time step n. Such measurement structure has to be a member of the set of admis-
sible measurement structures,

C = {c I , c2 . . . . . c_M} (169)

i . e . , at time step n, c(n) can take any value c i , i = l , . . . , M.

The cost of i d e n t i f i c a t i o n errors is a function of the covariance matrix Vx(N)


at the terminal time step N. Let us denote that function by @{V~(N)}. The measure-
ment cost depends solely on the measurement structures c(n), n = O, l . . . . . N-l,
and can be expressed by the summation
N-l
E ~n[C(n)] (170)
n=O

Hence, the problem is to specify the measurement s t r u c t u r e at each time step,


i . e . to determine the optimal strategy c_~(n), n = 0, 1 . . . . , N-I such as to minimize.
the overall cost

N-l
q(0, N) = @{VK (N)} + Z ~n [c(n) ] (171)
n=O

where ~ ) 0 is a weighting factor compromizing the i d e n t i f i c a t i o n accuracy ( f i r s t


term) and the measurement cost (second term).
52

2,15.2. Extended Kalman f i l t e r 16.


Let us ~ the optimal measurement structure ~(m) has been determined for
all m, m : O, I , . . . , n-l. Now introduce the matrices

afn(R(n)). @gn+l(~((n+l)/n),~(n+l))
Fin ) : , G(n+l) :
~(n) 3~((n+l)/n)
(172)
the a priori variance matrix

Z~ ((n+l)/n) : F(n) Vx(n) FT(n) + Vw(n) (173)

and the gain matrix

K(n+l) = V~((n+l)/n) GT(n+I) [G(n+l) V~((n+l)/N) GT(n+I) + Vv(n+l)]-I (174)

((n+l)/n) in (172) denotes the one-stage prediction state determined by

R((n+l)/n) = f_~(~(n)) (175)

The f i l t e r equations can then be written as

R(n) = ~ ( ( n + l ) / n ) + K(n+l) [~(n+l) - ~n(R((n+l)/n ), c(n+l))] (176)

The covariance matrix for that estimate is given by the algorithm

V~(n+l) = [l-K(n+l) G(n+l)] V~((n+l)/n) [l-K(n+l) G(n+1)] T +

+ K(n+l) Vv(n+l ) KT(n+I) (177)

Due to the symmetry of the matrix VR(n+I) i t can be represented by a vector


~(n+l)

_~(n+l) T = (~l(n+l), ~2(n+l) . . . . . ~p,(n+l)) , p' : P ~ (178)

~i (n+l) :(V~<(n+l ))j ,k (179)

where
53

j=k=i for i = I , 2. . . . . p
j:k-I =i -p " i : p + 1 . . . . . 2p-I (180)
j = k - 2 = i - (2p - l ) " i = 2p, . . . . . 3p-3

j=k-(p-l):i-EC +l " i=P(P2+l)

Thus the up-right diagonals of the V~ matrix are arranged componentwise for the
a vector. Henceforth, the vector a w i l l be referred to as the variance vector.

Let us note that the variance vector a(n) at time n is a function of ~(0) and
the measurement structures ~(m) for a l l m = O, 1. . . . . n-I through the sequential
representation

~(n) :h_l{~(n-l), ~(n-l)} , n : I, 2 .... N

~(0) given (181)

Here h~_l denotes the algorithm given by (172), (173), (174) and (177).

In terms of the variance vector ~(n) the overall cost (171) can be written as

N-l
Q(O,N) : @(_a(n)) + ~ Z ~n[C(n)), ~ >~ 0 (182)
n:O

2.15.3. Dynamic programming solution.

Consider the time step N-l. I t follows from (181) that

@[~(N)] = @[h_~_l {_~(N-l), c(N-l)}] (183)

Assume that c~(O) . . . . . c9(N-2 ) are determined, then the optimization of (182)
w i l l be reduced to minimizing

Q(N-I,N) : ¢[h_N_l(~(N-l), ~ ( N - l ) ) ] + ~ ~N_I[C(N-I)) (184)

over the set of a l l admissible measurement structures C. Fixing numerical values for
~(N-l) w i l l then make (]84) a function o f ~ ( N - l ) , which can be minimized over the
set C to y i e l d a value ~ ( N - l ) corresponding to the numerical value assigned to
~(N-l). That procedure is repeated for large d i f f e r e n t possible realizations of
~(N-l) which can be produced by suitable random generation. The numerical results
are then tabulated according to patterns or classes of elements. In that respect one
may define M patterns : Al . . . . . Ai . . . . . AM the i th of which can be represented as
54

Ai = {~j : cm(~j) = ~ i } , i : 1..... M (185)

Here c~(~
'')-~ denotes the optimal structure corresponding to a value ~J. ci is an
element of the set C. Notice that there are M classes of ~ (at any time step n)
which equals the cardinal number of the set C.

Let us denote bY~n the decision vector function needed to recognize the appro-
priate pattern of a sample ~(n) at time n. By means of that decision function i t is
possible to assign the optimal structure c~ corresponding to the variance vector
at time n. Let us represent that decision procedure formally by the equation

~m(n) = ~ n ( ~ , ~(n)) , n = N - I ..... I , 0 (186)

The algorithm corresponding to equation (186) w i l l be discussed in the following


section.

Notice that the time step is considered to be n rather than N-l in (186) in
order to save w r i t i n g when the procedure is again mentioned i n the fol%owing argu-
ments.

S u b s t i t u t i n g by (186) ( f o r n = N-I) i n t o (184) we get the optimal one-stage cost

Q(N-I,N) = ¢ [~N-I(~(N'I)'~N-I(~N-I'~(N-I)))] +

+~ ~N-I I~N-I(EN-I'~(N-I))] = YN-I[~(N-I)] (187)

I t is to be emphasized that YN-] denotes a numerical procedure rather than a function.


An analytical expression of YN-I is not needed.

Let us now consider the two stage decision process (N-2,N) for which the cost
can be written as

Q(N-2,N) : YN_I[_~(N-I)] +~ ~N_2(c(N-2))

: Y N _ I [ h N _ ] ( ~ _ ( N - 2 ) , c ( N - 2 ) ) ] +X C N . 2 ( c ( N - 2 ) ) (188)

By analogy with (184) i t is obvious that optimization of Q(N-2,N) can proceed in


principle as the same as before. The numerical results (relating ~(N-2) and c__m(N-2)
thus obtained can then be classified in M patterns (185). The appropriate decision
procedure (186) is then sought and so on.
55

2.15.4. Pattern recognition.

Let us consider the c l a s s i f i c a t i o n of the M patterns (185) by successive dicho-


4
tomy , i . e . by successively s p l i t t i n g the patterns (or classes) intq groups.

I f we begin, for example, with four classes ; we could solve the following two
problems in the order given :

(1) Find a decision boundary separating classes I and 2 from classes 3 and 4.
(2) Using samples of classes l and 2 alone, find a decision boundary separating
class l from 2. Using samples of classes 3 and 4, find a decision boundary sepa-
rating class 3 and 4.

Notice that the decision procedure need to be carried out a maximum of M-l times
for an M-class problem. This can easily be shown Fig.13 where the decision procedure
is de~cted as tree structure, each of the nodes corresponds to a decision function.
The parallel structure (Fig.13.a) has the advantage of quick decisions since the
maximum number of decision functions in a decision procedure is less than that of
the series structure (Fig.13.b).

The determination of a component of the p vector function separating two classes


can be established by the probabilistic i t e r a t i v e approach9.

Let us designate the decision function by

T
= pi(~,S) : ~ ~(~) (189)

where pi(~,S) is a function that is known up to the parameter vector S/ : (s I . . . . . sv).


The signs of the decision functions define the regions §

X1 : {~ : pi(~,S) < O} , X2 = {~ : pi(~,S) > O} (190)

On the other hand, a teacher gives us the correct c l a s s i f i c a t i o n of each observed


sample ~ :

-I i f ~ is classified into X~
Y (191)
l +l i f ~ is classified into ×2

The superscript ° denotes the correct class. Obviously the decision w i l l be correct

§Notice that according to Fig.13.a or b the classes or X~ may consist of the


summation of several patterns A., . . . . A. according ~ the Uecision tree and the
d e c i s i o n node. l j
56

if
y.pi(~,S) > 0 (192)

and i n c o r r e c t i f the opposite is t r u e , i . e .

y. pi( E, S) < 0 (193)

As a penalty f u n c t i o n f o r m i s c l a s s i f i c a t i o n , we s e l e c t a c e r t a i n convex func-


t i o n of the d i f f e r e n c e between y and ~, t h a t is

D ( y - pi(~,S)) (194)

Then the average r i s k of m i s c l a s s i f i c a t i o n can be w r i t t e n as

R : : D ( y - pi(~,S)) p(~) d~
(195)
= E { D ( y - pi(~,S) ) }

where p(~) is the unknown mixture probability density


M
P(~) = k~l Pi Pi (~) (196)

We denote by P1 . . . . . PM the p r o b a b i l i t i e s o f occurence o f ~ in the classes X1 . . . . .


XM ; and by p l ( ~ ) . . . . . pM(~ ) the c o n d i t i o n a l probability d e n s i t i e s of ~ i n classes
X1 . . . . . XM.
The necessary c o n d i t i o n f o r the minimum o f the average r i s k is

Vs R = : Vs D ( y - Pi(m'S)) p(_~) d_~


(197)
= E { Vs D ( y - Pi(-m'-S))} : 0

9
Now on the basis o f the r e s u l t s o f we can e a s i l y o b t a i n the l e a r n i n g a l g o r i -
thms f o r the p a t t e r n r e c o g n i t i o n system

S(k) : S ( k - l ) + F(k) D ' ( y ( k ) - S/(k-l) ~{~(k)}) J~(k)} § (198)

§ D'(x) = dD(x) I dx
5?

where r(k) is a step-length matrix to be defined.


I f the function D( . ) is considered to be the quadratic function

D (y - pi(_m,S) ) = ½ (y - pi(m_,S))2 (199)

9
then the algorithm for optimal learning will be

S(k) = S(k-l) + E(k) [y(k) - sT(k-I) ~{~(k)}] !{~(k)}


E(k-l) ~{~(k)}(E(k-l) ~{~(k)}) T
(200)
E(k) = E(k-l) - l + TT{~(k)} E(k-l) ~{~(k)}

S(O) arbitrary, and E(O) > O.

2.15.5. On-line identification.


The ~ functions and the successive dichotopy procedure discussed in the previous
section specifies the pattern recognition block, represented formaly by equation
(186). The variable - structure identification scheme is depicted in Fig.14 and
the flow-chart of the algorithm is shown in Fig.15.
The process dynamics are described by equation (164) where w(n) is the random
disturbance. The state x(n) is measured by the variable-structure measuring system
(]67) where the measurement z(n) is contaminated by noise ~(n). The block "estimator"
denotes the extended Kalman f i l t e r . The optimal measurement structure is specified
by classifying the pattern of the estimation variance vector ~(n). The "pattern
recognition" block performs that classification and assigns accordingly the appro-
priate value of £m(n) such that c~(n) = ci i f ~(n) cAi .

This completes the synthesis of the on-line identification system.

Illustrative example.
Consider the slab-type nuclear reactor which is represented by the following
17
four-point model , based on space discretization where X_~ is a four-dimensional
vector representing the state of the system at the four mech points. The state tran-
sition matrix F is given by

0.904 0.058 0.002 0.00039


F= 0.058 0.906 0.058 0.00185
0.002 0.058 0.906 0.058
0.00039 0.00185 0.058 0.904
58

clas<17°
cl ass 2 P~/~ ~<0

class 3 class 4
a. Parallel structure. b. Series structure.

Fig.13 Successive dichotomies in a four-class problem : two possible trees.

~(n) z(n)

UI
Process
x_(n) Measuring
System Estimator
R(n)
,,Im,,,-

(variable ~(n)
I structure)

(n)
l
Pattern
E_
Recognition

I-
Fig,14 Variable-structure identification scheme.
59

Specify ci ; i=l . . . . M.
Read Vw, Vv.
Choose 9= ST T(~).

Classification,
Given ~(n) try d i f f e r e n t i
i = l , . . , M , to calculate V~(n+l).
Compute Q(n,N).

I
' Generate a
sample of ~'I -
Find the optimal structure c~ ,
which minimizes Q(n,N).
Repeat this for different ~(n).
Cluster the sample of ~(n) into M
classes according to their perta-
ining to the same optimal struct-
ure.

Decision rule.
I n= n-I I

Using the training algorithm


(200) establish the tree decision
rule E(~,S).

Yes ~ No

Fig.15 Flow chart of the identification scheme.


60

The measurement is given by

cT
z n = __nX~ + vn

where c_~ is assumed to have one of the following two measurement structures (M = 2):

~I : ( l 0 0 O) T , _c2 = (0 0 l O)T

I t is required to determine the measurement strategy that gives a good estimate


for the state. Here (see equation (171)

@{V~(N)} = t r V~(N)

The ~ vector w i l l be taken as the diagonal elements of V£, thus comprising four
components. Each component mi' i=l . . . . . 4 is generated independently using a uni-
form random distribution over the range of values between 0 and I . For each specific
value of ~, the optimum structure is determined for the last stage decision process.
Then the decision rule is calculated using the learning algorithm (200). The form
considered for the decision rule is

~? = sT m = Sl ~l + s2 m2 + s3 m3 + s4 m4"

In computing the vector S f i f t y samples of the vector ~ are used. The vector
is also calculated for the two-stage decision process (N-2,N) using again f i f t y
generations of ~ and the decision rule calculated from the single-stage decision
process (N-I,N) as explained before.
The learning scheme converged to the following values :

1.5630
-0.3637
I 0.4294
= I-0.8950
S =
S
- N-l -I.1492 N-2 ~ 0.9488
0.0065 ~-0.8022

The above decision rules are tested on the same samples of ~ and the misclassi-
fication ratio of the optimum structure is 4% for the single-stage and 6% for the
two-stages which are considered to be quite satisfactory.

2.16. CONCLUSION.

Pattern recognition consists of two interrelated problems : feature extraction,


61

and c l a s s i f i c a t i o n .

Feature extraction is concerned with extracting, by means of a certain trans-


formation, the important features of the pattern vector with view to c l a s s i f i c a t i o n .
The dimension of the feature vector must be as small as possible. Feature extrac-
tion consists essentially in extracting what is regarded as valuable information
conveyed by a pattern vector. This generally implies that some information conveyed
by the pattern vector has to be discarded. There must be some c r i t e r i o n as to what
information is to be sought and what to be discarded. I f the c r i t e r i o n is to keep
the information conveyed by the feature vectors as close as possible to the i n f o r -
mation conveyed by the respective pattern vectors then the transformation must
maximize the entropy in the feature space. That c r i t e r i o n amounts to extracting the
features that characterize the attributes common to each pattern class. This is the
criterion for intraset feature extraction.

On the other hand, in order to extract the attributes that emphasize the d i f f e -
rences between or among pattern classes i t is necessary to perform the utmost orga-
nization in the feature space ( i . e . minimize the entropy) in order to cluster the
different populations and hence f a c i l i t a t e separability. This is the c r i t e r i o n for
interset feature extraction.

The problem of optimal c l a s s i f i c a t i o n is then stated a n a l y t i c a l l y . In the case


of complete a p r i o r i information, that problem can be solved by employing the sta-
t i s t i c a l decision algorithms.

I f the cost of taking feature measurements is to be included or i f the features


extracted from the input patterns are sequential in nature then sequential classi-
fication methods are to be used.

In that respect Wald's test has been presented. That test becomes practical
when considered in i t s discrete version. To that end f i n i t e automata can prove to
be a useful tool. Certain form of such automata (automata with linear tactic) is
given in some d e t a i l .

The Bayes and probabilistic i t e r a t i v e techniques have been presented with view
to solving the pattern-recognition problem. Using those techniques d i f f e r e n t lear-
ning algorithms with or without supervision can be obtained. Learning without super-
vision demands more complex algorithms and takes a longer time than learning with
supervision under the same conditions. This agrees with the fact that ignorance must
be paid for.

Finally an application of pattern recognition technique to i d e n t i f i c a t i o n of


dynamic systems is presented.
62

COMMENTS
2.1. The d i v i s i o n of the pattern recognition problem into extraction and c l a s s i f i -
cation problem is rather a r t i f i c i a l . Essentially there is no "hard" boundary
between both problems. For "perception" aids "decision" inasmuch as "decision"
structures what to perceive. An adaptation scheme should be envisaged to adapt
the extraction and c l a s s i f i c a t i o n algorithms simultaneously with view to
better recognition.
2.2. - 2.5. The terminology of " i n t r a s e t " and " i n t e r s e t " features is due to Tou and
Heydron I . They seem to be erroneous in indicating that i n t r a s e t extraction
c r i t e r i o n corresponds to m~nimizing the entropy, which leads them to an ambi-
guous r e s u l t . Also see Tout , Young and Calvert ~.
2.8. For Wald'~ SPRT, see Fu5, The automata model is taken from Radyuk and
Terpugov u.
2.10. An i n t e r e s t i n g discussion about the complexity of the non-supervised Bayes
algorithm is also given in Young and Calvert 3 p.83. Approximation of unsuper-
vised Bayes learning are the subject of many researches, see e.g.18,19.

2.11. The con§e~t of i d e n t i f i a b i l i t y of f i n i t e mixture is o r i g i n a l l y presented in


Teicher " , ° .
2.12. - 2.14. The basic reference is Tsypkin9. Accounts of the theory of the itera-
tive h~obabilistic methods can be found in I0-12.
2.15. That application is due to EI-Fattah and Aidarous13.

REFERENCES
I. J.T. Tou, and R.P. Heydorn, ~'Some Approaches to Optimum Feature Extraction", in
Computer and Information Sciences - I I (J.T. Tou, ed.) Academic New York, 1967.
2. J.T. Tou, "Feature Selec~on f o r Pattern Recognition Systems", in Methodologies
of Pattern Recognition,(S. Watanabe, ed.) New York : Academic, 1972.
3. T.Y. Young, and T.W. Calvert, C l a s s i f i c a t i o n ? Estimation, and Pattern Recognition
New Yo~k : Elsevier, 1974.
4. W.S. Meisel, Computer-Oriente d Approaches to Pattern Recognition. New York :
Academic, 1972.
5. K.S. Fu, Sequential Methods in Pattern Recognitio n and Machine Learning : New
York : Aca
e~, 1968.
6. L.E. Radyuk, and A.F. Terpugov, "Effectiveness of Applying Automata with l i n e a r
Tactic in signal Detection Systems, "Automation and Remote Control", N°4, 1971,
pp. 609 - 617.
7. H. Teicher, " I d e n t i f i a b i l i t y of Mixtures", Ann. Math. Stat. 32,1961, pp. 244-248.
8. H. Teicher, " I d e n t i f i a b i l i t y of F i n i t e Mixtures", Ann. Math. Stat. 34, 1963,
pp. 1265 - 1269.
9. Tsypkin Ya. Z. Foundation of the theory of learnin 9 systems. Academic Press,
1973, New York.
I0. G. Albert, Stochastic Approximation and Non-linear Regrassion. MIT Press, 1967.
I I . N.V. Loginov, "Methods of Stochastic Approximation", Automation and Remote
Control, 27, N:4, 1966, pp. 706 - 728.
12. D.J. Sakrison, "Stochastic Approximation : A Recursive Method for Solving
Regression Problems" in Advan. Cgmmunication Systems, 2, 1966.
13. Y.M. EI-Fattah, S.E. Aidarous, "A Pattern Recognition Approach f o r Optimal
Measurement Strategies in Dynamic Systems I d e n t i f i c a t i o n " , IFAC sump. on
83

I d e n t i f i c a t i o n T e b i l i s i (USSR), 1976.
14. Aidarous S.E., Gevers M.R., I n s t a l l e M.J. I n t . J. Control, 1975, Vol. 22, 197-213.
15. Athans M. Automatica, 1972, Vol. 8, 397-412.
16. Sage A.P. Estimation and i d e n t i f i c a t i o n . Proc. 5th IFAC Congress, 1972, Paris.
(France).
17. Hassan M.A., Ghonaimy M.A.R., Abd EI-Shaheed M.A., "A computer algorithm for opti-
mal discrete time state estimation of linear distributed systems", Proc. IFAC
Symp. on Control of Distributed Systems, 1971,Banff (Canada).
18. Patrick E.A., J.P. Costello and F.C. Monds, "Decision Directed Estimation of a
Two Class Decision Boundary", IEEE Trans. on Computer, vol. C - 19. N°3,
pp. 197 - 205, 1970.
19. Makov, U.E., and A.F.M. Smith, "Quasi-Bayes Procedures for Unsupervised Learning"
Proc. of the 1976 IEEE Conf. on Decision and Control. Paper WP 4.
20. Rosenblatt, M., "Remarks on some non-parametric estimates of a density functions",
Ann-Math. S t a t i s t . , 2__77,832 - 837 (1956).
21. Kullback, S. Information Theory and S t a t i s t i c s . New York : Jo Wiley and Sons
1958.
CHAPTER 111

S I M U L A T I 0 N - MODELSOF COLLECTIVEBEHAVIOR

"Notre propre int~r~t est encore un merveilleux


instrument pour nous crever les yeux agr~ablement".

B. Pascal : Les Pens~es.

3 , I . INTRODUCTION.

Tsetlin I has proposed d i f f e r e n t norms of behavior of a f i n i t e automaton working


in a random environment. In that work the environment is assumed to either penalize
or reward each action of the automaton according to certain unknown probabilities.
The behavior of an automaton is called expedient i f the average penalty is less
than the value corresponding to choosing a l l actions with equal probabilities. The
behavior is called optimal or c-optimal according to whether the average penalty is
equal or a r b i t r a r i l y close, respectively, to the minimum value. Krylov and Tsetlin 2
introduced the concept of games between automata and studied in particular Two-
Automaton Zero-Sum games.

Stochastic automata with variable structure have been introduced by Varshavskii


and Vorntsova3 to represent learning automata attempting a-certain norm of behavior
in an unknown random environment. Since the date of that work a respectable number
of works has appeared, studying d i f f e r e n t aspects of learning automata and applying
i t in simulating very simple norms of behavior ( l i k e that introduced by Tsetlin)
and also simple automata games (such as Two-Automaton Zero-Sum games). For a survey
on the subject we refer to Narendra and Thathachar4.

The contribution of this chapter is to direct the attention of using learning


automata to simulate an important class of problems of collective behavior whose
deterministic version has been the subject of recent investigations mainly by
Malishevskii and Tenisberg 5-8. In that class of problems there exists a type of
relation in the collective where the behavior of the participants possesses a defi-
nite mutual opposition. Such situation can arise for example in economic systems :
the case of price reguiation in a competitive market9 ; or in management systems :
the problem of resource allocationlO.

In the model introduced in this chapter a collective of interacting stochastic


automata is considered. Each automaton has a behavioral t a c t i c directed towards the
65

realization of i t s own goal, taken to be the minimum of the expected value of a


certain penalty function. That function depends e x p l i c i t l y on the automaton strategy
and the environment response. The automata interactions arise from the dependence
of the environment response on the whole set of strategies used by the collective
of automata. That dependence is stochastic and unknown to a l l the automata. Further-
more, any automaton does not know neither the penalty functions, nor even the number
of the other automata. The only available knowledge to each automaton is the r e a l i -
zation of i t s penalty function following the use of a certain strategy.

This model is useful for analysis of some problems of collective behavior in


large systems. Its use enables in particular organizing the local behaviors of the
different subsystems (constituting a large system) in order to ensure certain desired
collective behavior. By organization of local behaviors is generally meant the for-
mation of appropriate c r i t e r i a (through introduction of penalties or premiums etc.)
the provisions of new degrees of freedom, introduction of new links and control
levels between the subsystemsII . In b r i e f , creating an external environment for
each subsystem such that the collective behavior of the subsystems - though each
pursuing solely i t s own private interest - is desirable in a definite sense.

3.2. AUTOMATAMODEL I - SUFFICIENT A PRIORI INFORMATION.

As model of collective behavior we consider the following game between N-Lear-


ning automata Al, A2 . . . . . AN, see Fig. l . A Learning automaton is considered as a
stochastic automaton with variable structure as depicted in Fig.2. The automata
operate on a discrete time scale t = l , 2 . . . . . The input ui to a stochastic automa-
ton can only take one of the values - l , O, +l. The output (.) yi of the automaton Ai
w i l l be assumed to take one of the mi values Y I ' Y2"i ...Y~i, which w i l l be called
its stategies. We w i l l say that the automaton Ai uses the jth strategy at time t i f
y i ( t ) = y ij .

A p l a y y ( t ) carried out at time t w i l l be the name given to the vector y ( t ) = ( y l ( t ) ,


y2(t) . . . . . yN(t))T whose components are the strategies used by the automata Al , A2,
. . . . . AN at time t . The outcome ~(t+l) of a play ~(t) is the vector ( s l i t + l ) ,
s2(t+l) . . . . . . sN(t+l)) T whose components are the referee or environment responses
to the set of automata at time t + l . The environment is completely characterized by
the probabilities p1(s1(t+l) / ~ ( t ) ) of the outcomes s i ( t + l ) , i = l . . . . . . N, for
every play y ( t ) . As only stationary environments w i l l be considered, the aforemen-
tioned probabilities can simply be written as p i ( s i / z ) .

The probability distribution of the output of the i - th automaton is specified


by the vector

(.) a unique deterministic mapping between the automaton states and outputs is assu-
med.
66

i p~, .. i.)T
pi = (pl . . . . PmI (i : l . . . . . . N)

i mi
O<pj<l Z i
j=l pj = 1

where pji is the probability that the automaton uses i t s pure strategy y~. The proba-
b i l i t y vector p_i specifies the mixed strategy of the i - th automaton.

y!

A1 - ~_ ~-
Z

• :---
z
O

i ] yi >~I
S ~ ~.
Ai ~- =" ~I ~m-
&aJ

$- (~.
• o

L~

r~

sN yN "
AN m -- ,,,
r~

Fig.l Gamebetween N Learning Automata

Variable structure stochastic automaton is the name given when the probability
vector i is always modified according to some reinforcement scheme. This may be
effected through changing the elements of the automaton transition probability matri-
ces corresponding to the automaton input ui .

The objective of each automaton in the game is to seek the mixed strategy~ i
that minimizes i t s average penalty, taken as the expected absolute value of a func-
tion Fi that depends on the automaton's strategy yi and the environment response si ,
87

ADAPTIVE
DEVICE

PERFORMANCE STOCHASTIC yi
EVALUATION AUTOMATON

Fig. 2 Learning Automaton

Qi ({pk}) : Ei i {[Fi(ei(yi' si))l}


s ,y
(2)
= z f® pl 2 .pN Fi.o i. i si.. "
jl j2.. jN -~ jl Pj2 " jN[ ~ ~Y.i'3 )) Idp1(si/{ykk})j

where ei is a penalty index which depends on yi, sl,and Fi is a function "retaining


sign", i.e

sign Fi (OI) = sign oi (3)

The interaction between the automata is obvious from the interdependence of


their goals ; notice that Qi is a function of j T = (~IT, ~2T. . . . . . . ~NT).
Let us rewrite the average penalty (2) in the form

Qi(~) : Eyi {16i(z)i} (4)

where ~i(y_) : Esi {Fi(o1(yl, sl))}

; f Fi(ei(y i, si)) d pi(si / Z) (5)


68

I t is assumed that for each i - th automaton for arbitrary fixed values of the fo-
reign strategies y l , . . . , y i - 1 , y i + l , . . . . yN there e x i s t s one value of the own
strategy yi which is the "best". Let us denote its value by
yi~(y) = y i ~ ( y l . . . . . y i - l , yi+l . . . . . yN). Such yi is given by minimizing 16i(~)I
i
with rerespect to y .
We shall call 6i(y) an indicator function for the i - th automaton i f i t gives i n d i -
cation of the "distance" from the best s i t u a t i o n , i . e .

• F> 0 when yim(y) > yi

61(Z) t < 0 when -- < yi


yim(~) (6)

Furthermore, ~ i ( z ) is assumed to decrease ( s t r i c t l y ) monotonically in the set of


own variable yi and does not decrease in the set of foreign variables {yJ}, j # i .
That condition w i l l be called the condition of contramonotonicity.
Let us arrange the set yi = {YI' Y~. . . . . Y~i} of the automaton's Ai strategies
i i ~ i
such that yj > Yk for all j > k. For any strategy y (l< j < mi) we call Yj+I the
next supremal strategy and Y~-l the next infima] strategy.
From the contramonotonicity condition the indicator function 6i(y) of the
i - th automaton decreases (strictly) with respect to its own variable yi. This
suggests the following operating principle for minimizing the magnitude of ai. I f
y i ( t ) = y~ resulted in 6i > 0 use the next suprema] strategy, i.e. y i ( t + I).= Y~+l'
on the other hand i f ~i < 0 use the next infimal strategy i.e. y i ( t + l) = Y~-!"
Otherwise ; on either one of the conditions :

i) 6i = O, i i ) y i ( t ) = Y~i and 6i > 0


i i i ) y i ( t ) = Yli and 6i < O,
remain in the status quo.
Upon modifying the indicator functions to be in the form

O, i f yi = Y iI ' 6i (Z) < 0


~i(z ) : or yi = Y~i, 6i(z) > O, (7)
6i(~), otherwise.

The above principles of behavior can be written thus

i f y i ( t ) = yji i
then yi(t+l) = yj+ui
where
u i : sign "'(~i(y(t))
" " " (8)

In the limit, when the set yi tends to the continuous interval [y~, Ymi], and the
6g

time increment between successive steps tends to zero, the above rule of behavior
will approximate the following continuous-time model of collective behavior,

dy i
sign-d~ - = sign ( ~ i ( y ) ) (i : 1 . . . . . N) (9)

I t was demonstrated by Malishevskii 8 that all nondegenerate t r a j e c t o r i e s of the


process (9) converge to the equilibrium point y__~Y = yl xy2 x . . . . . xYN, i f e x i s t s ,
such that

~i(y_~) = 0 for all i (I0)

To realize the above rule of deterministic behavior using stochastic automata, a


reinforcement scheme should be divised such that

Pr [yi(t+l
)_. i . > Pr [yi(t+l) = y~]
= Yj+ul]
(ll)
for all k # j+u I

An example of such reinforcement scheme may be written thus

if y i ( t ) = Y~i then

• i .
p~i+ui(t+l ) = Pki+u1(t) + yi(t+l) 16i(y_(t))I
pji (t+l) : p~(t) - yi(t+l)m
i - 1 I~i(y(t))l (12)

j = l ............. mi j # ki + ui

where y1(t+l) should satisfy the conditions,

yi(t ) ~ yi2(t ) yi(t)


tel =~ ' tel < ~' > 0 (13)

and subject to the upper bound,

i min p~(t)
l-p ki+ui(t ) . .
i(t+l) < min ( . ; (mi-l) j }~ kl+ul ) , ~i # 0 (14)
I~I I~il
to guarantee that the p r o b a b i l i t y vector pi s a t i s f i e s the condition 0 < p~ < 1 for
all j .
?0

3.3. AUTOMATAMODELII - LACKOF A PRIORI INFORMATION.

In the reinforcement scheme (12) complete a priori information is assumed,


precisely the joint probability of outcome and play of automata (pi(si, y)) which
fully characterizes the external environment.
In the case when such a priori information is unavailable the indicator func-
tions 6i(z), see eqn. (5), cannot be specified explicitly. In such case the scheme
(12) is to be replaced by a learning algorithm which provides estimate of the pro-
bability vectors ~i(t) using the random observations s i ( t ) . Similar to Kiefer -
Wolfowitz stochastic-approximation method12 we drop the expected-value symbol (with
respect to sI) from the expression of 6~ to get the following reinforcement scheme,
provided that

y i ( t ) = yi
ki
then

pi. (t+l) = p i . (t) + yi(t+l)I F1(eI (Y~i s i ( t + l ) ) ) l


kI +ui kI +uI

pj(t+l) : pj(t) - yi(t+l)


mFi(oi(y~i. , si(t+l))))I (15)
mI - 1

j = l . . . . . . . mi , j # ki + ui

where
ui = sign (oi(y~i, si(t+l)) (16)

The sequence yi(t+l) should also satisfy the condition (12) besides the upper bound
condition
• i
1Plkl+ul(t) min. Pi(t)
, ( m i - l ) J#k1+ul ), 0 i # 0 (17)
yi(t+l) < min ( IF1 (el) I IF~ (el) I

to guarantee that p~(t) are always between zero and I.

Similar to (7) the modified penalty index ~i is defined as

• • o

~ 1 ( y l s i) = or yi : y l i , ei > O, (18)
m

oi ( y i s i ) , otherwise
71

The idea underlying the functioning of a learning automaton in the present model
can be stated as follows. At any time step i f the automaton action has elicited an
environment response for which the penalty index ei is greater than zero then at
the next time step the probability of the next supremal action is increased. On the
other hand i f the penalty index is less than zero then the probability of the next
• d e x yji -
infimal action is increased. I f in the case of positive penalty ~n i o~ ; i or
_ ym

de i i .
in the case of negative penalty in x yj = Yl then ~ncrease the probability of y j .
Finally i f the penalty index is zero, the automaton remains in the status quo.

3.4. EXISTENCEAND UNIQUENESSOF THE NASHPLAY.

~mm~_!.
For contramonotonic indicator functions ~i(y)

Ayi~(~yi}) > 0 for all ayi # 0 (i = l . . . . . N) (19)


Ayj

~r~f.
i) Let~ ~O,j =l ..... N, j~i, andA/>O
at least for one j . Let yi~ ({yj}) denote the best strategy of the i - th automaton
for arbitrary fixed values of the foreign strategies {yJ}= {y~}. Let
yi~ ( { y ~ } ) = y ~ , 1 < ~ <mi and ~i(y~, {y~}) ~ O.

As a consequence of the contramonotonicity assumption

for all ~ <

Accordingly the optimum yi for the new {yJ} = {y~ + ayj } cannot be found in the
subset of strategie (y . . . . . y~_l) and can only be found in the subset
Y~+I . . . . . . . Y~i). Hence ay1~>o. On the other hand i f ~i(y ,{y })< 0 then u=l,
or else ai(yi ~,{y~})>O. I f v=l then (19) is automatically satisfied. I f

i-
for all Z<v-l , whence by >0, and (19) holds.

i i ) Let Z~yj ,< O, j = 1. . . . . . . N, j # i , and AyJ< 0 at least for one j .


Consider yi~({y }) = yi, l < ~ < mi and 6i(y , {y }) .< O. As a consequence of the
contramonotoni city assumption,
72

for all ~ >


°

Hence the optimum yi~ for {y + ~yJ} can only be found in the subset {YI~"" , Y i~ . l ) .
Accordingly ayi~ < 0 and again (19) is verified. On the other hand i f ~1(y~,{y~})>O
then either ~i(yi.,,{y~})<O or else v= mi . I f ~= mi then (19) is automatically
.~l i K . . . . . .
satisfied. I f 0 then i j
(yZ,{yk})<a i (yv+l,{Yk})<
i j 0 for
all Z>v+l. Hence ~y1*<O, and (19) holds.

Lemma I has the f o l l o w i n g game t h e o r e t i c i n t e r p r e t a t i o n . When s t r a t e g y y i o f a


player Ai has the character of the magnitude of i t s force, contramonotonicity means
t h a t each automaton, in t r y i n g to minimize his penalty t r i e s to respond by an i n -
crease in his own force to increase of force of the other players.

Theorem I (Existence).

For contramonotonic i n d i c a t o r functions there e x i s t s a Nash play 9 such t h a t


#i : y i ~ ( p ) , ( i : I , 2 . . . . . N).

Proof.
i i
Let ~o = I ( i = 1 . . . . . . N), and suppose t h a t Y i is not a Nash play. Consider
o
the set o f automata s t r a t e g i e s given by

= yi~({yj }) (i : 1...... N)

Suppose t h a t { y l i } is not a Nash play. By v i r t u e of Lemma l ,

yil -< y i ~ ( { y j } ) -< Y~i whenever YJ.


~ S YJ ~ YmJ
J"

For a l l j # i , (i = 1 . . . . . . N)
i . e . the mappings y i ~ ( . ) map the subset of plays

y(1)
: {Z : Yi i S y i ~ Ymi
~ (i: 1..... N)}

into itself. Thus a Nash play must e x i s t in y ( 1 ) . Let us then consider the s t r a t e -
gies,

Y;~
• ~l
: yi~( { y j j } ) (i : 1 ....... N)
?3

I f {yl i} again is not a Nash pl~y, then a Nash play must be in the subset,
~2
i " i
y(2) = {y : y i ~ y ~ Ymi (i = l . . . . . . N)}
~2

and the mappings yi~ (.) map y(2) into i t s e l f . By successive application of the
above procedure i t follows that i f {yl i} is not a Nash play then the candidate for
~s
that play must be in the subset of plays

i yi i
y(S) = {~ : y i < < Ymi (i : l . . . . . . N)}
~s
By Lemma I i t is clear that

For all i. Since yim(.) map the subset y(S) into i t s e l f , i t is obvious that in the
limit - unless yl i is a Nash play for some ~ - the subset of candidate Nash plays

will degenerate to the "boundary" point {Ymi}. The proof is complete.


B~ :

Two situations where a Nash play is obvious are,


i) ai(y) is always positive for all i and Z. The Nash play will be the boundary
point {yimi}.
i i ) 6i(y) is always negative for all i and ~. The Nash play will be the boundary
point { y i l } .

Henceforth we shall be considering collections of functions-indicators 6i(y) . . . .


. . . . . ~N(y), satisfying the condition formulated below. In the adopted formulations,
and y + ~y are arbitrary points of Y = yl X y2 X. . . . X yN, while
A~i(z) = 61(Z + Ay) - 6i(z).

CONDITION I :
Let ay # O. We so partition the set of subscripts I = { l , 2 , . . , N } into three subsets

I> , I= , I< ; such that i~ I>*-~AyI>O, i~ I=*-~ Ayi=O, i~ I<*-~ ayl< " O.
The following inequality is then assumed to hold :

A~i(~) - z a6i(~) + z a~i(~) < 0


i ~ I i ~ I i ~ I

(A~# O) for a n y ~ Y (20)


74

We r e a d i l y see t h a t the above condition presumes, in p a r t i c u l a r , monotonic decmease


of functions a i ( y_) with y l ( i = 1 . . . . . . N). That condition can be considered as a
v a r i a n t of the concept o f a monotonically decreasing f u n c t i o n , generalized to the
case of vector function ~(y) , i . e . , the c o l l e c t i o n of N functions o f N v a r i a b l e s
(61(yl . . . . . . yN) . . . . . . 6N(yl . . . . . . yN)).

One can also see the appearance of the boundedness o f the inter-automata i n f l u -
ences i f one provides the f o l l o w i n g i n t e r p r e t a t i o n : the i n t e r a c t i o n among several
g o a l - o r i e n t e d automata c o n s t i t u t e competition among the automata users f o r some
resource which is necessary to them. Let y i be the magnitude of the e f f o r t of the
i - th automaton to acquire the resource, while 6 i ( y ) is the magnitude of the d e f i -
c i t of the resource ( i f ~i < 0 then J 6 i j i s the magnitude of the excess) in terms of
the i - th automaton in the play Z. Then the monotonicity of 6 i ( y ) with respect to
y i , assumed by the condition I , means t h a t there is the p o s s i b i l i t y of s e l f - r e g u -
l a t i o n by each automaton i n d i v i d u a l l y , since there is guaranteed a decrease o f the
resource a v a i l a b l e as i t s own e f f o r t s increase (and conversely).

In the given i n t e r p r e t a t i o n , l e t group I> consist o f the automata increasing


on (in any case not decreasing) t h e i r e f f o r t s to acquire the resource while group
I< , on the c o n t r a r y , consist of the automata decreasing (not increasing) t h e i r
efforts. Then, according to condition I , the t o t a l d e f i c i t ~6 i f o r the f i r s t group
is decreased in comparison with the second group. This can be treated as a presup-
p o s i t i o n f o r the p o s s i b i l i t y of the group s e l f - r e g u l a t i o n of such automata.

In b r i e f condition I r e f l e c t s the r e t e n t i o n of the c a p a b i l i t y of s e l f - r e g u l a t i o n


in a system of automata whose mutual influences are bounded.

CONDITION 2 :
For any i , and f o r any a r b i t r a r y f i x e d foreign s t r a t e g i e s , 6 i ( y ) as a function
o f the own strategy y i does not assume two consecutive values which are equal in
magnitude, i . e .

j6i(yi yi-I y~, y i + l ., yN) j > j a i ( y l yi-I i i+l


. . . . . . . . . '. . . . Yj+I' y ..... YN) I
j = I, 2...... mi - I (21)
(i : 1 ...... N)
Condition 2 is necessary to guarantee t h a t f o r any a r b i t r a r y f i x e d foreign s t r a t e -
gies there e x i s t s only one best own s t r a t e g y ,

yim(z ) , (i = l ..... N)

We now i n v e s t i g a t e the uniqueness of the Nash play. Let us introduce the a u x i l i a r y


function
N
¢(Z) = i ~ I J~i(z) J (22)
75

6~.~.
Let condition I hold. Let ~i(y) Ayi ~ 0 (i = 1 . . . . . N) and 6~C # 0. Then
~ ( Z ) > 0.
~ .

We write the evident r e l a t i o n s h i p

AI~i(z)] : lai(z+ AZ)I - l a i ( y ) I ) - l a i ( z + Ay) - a i ( z ) l = - IAai(z)I


(23)

We use (23) to estimate those terms in A@(y) = ~AI6i(y)I corresponding to Ayi = O.


" 1
For those i for which AyI # 0 we shall provide another estimate. F i r s t , taking into
account that Isign Ayil = l , we have

l a i ( z + Ay) I ~ - a i ( z + Az)sign Ay i (24)

Second, in view of the conditions of the lemma, when Ay i # 0 and 6 i ( y ) # 0 we have


sign 6 i ( y ) = _ sign ay i so t h a t , when ay i # 0

lai(z)l = ai(y) sign ~ i ( z ) = - ~ i ( z ) sign Ay i (25)

From (24) and (25) we find

AI6i(z) I > - A6i(z) sign Ay i , i f Ay i # 0 (26)

From (23) and (26) we find

N S.
A¢(y) = ~
- i=l
Al6i(y)I :
-
i:Ay1#0 AIai(y)l + i:A~i=o a l a i ( y ) >

- (i:AySi #0 Aai(y)
--
sign ay i + i : A Z y i : 0 IAai(y) l) (27)

so that by virtue of condition I , A@(~) > 0.

Theorem 2 (Uniqueness).
For y to be a Nash play i t is necessary and sufficient that y be a minimum
point of the function @(Z)- ~ is unique.
Proof.

Sufficiency. I f yp is a Nash play then there exists three p o s s i b i l i t i e s :


i ) ~i(y__~)~ = 0 for a l l i ,
i i ) ai(y__~)-- < 0 for some i ,
76

iii) 8i(ym) > 0 for some i.


In the f i r s t case i t is t r i v i a l to see that 6i(y~)Ayi : 0 for all i.
In the second case, with due regard to the monotonicity of 6 i ( z ) , i t is obvious that
the next infimal strategy to yim w i l l correspond to a positive ~i. Let yim = 4 .
Then Ay i ~i(y_~) < O, Ay i > O, Ayi a i ( y l . . . . y i - I , y i _ l , y i + l . . . . yN )< O, yi<O.
In the third case, by similar reasoning, i t is obvious that the next supremal stra-
tegy to y i will correspond to a negative 6i. Then Ayi 6i(ym) < O, Ayi < O, and
Ayi 6 i ( y l . . . . y i - I , y i + l , y i + l . . . . yN )< O, yi> O.

I t follows then that in every case Ay i ~i(z~ ) ~ O for all i , and consequently
A@(y) > O. Hence y ~ is the minimum point of the function @on Y and is unique as
w e l l , by virtue of condition 2.
Necessity.
Let y~ = {Y~i} be the minimum point of ~. Assume that y_~ is not a Nash play.
That means with due regard to the monotonicity property that

i i i i
sign 6 (Yvi-l) = sign 6 (Y~i) i f ai(y__..) is negative or
i i i i
sign 6 (Y,ji+l) = sign 6 (Yvi) if 6i(y._~) is positive.

Let us then consider the point Z obtained by replacing the components of Yvi of
i i
y_~ by either Y~-I or yv+ 1 depending on whether 6i is negative or positive, respec-
tively.
Then we get the inequality ~i(y) (yi~ _ yi) ~ 0 for all i. Hence @(y_~) > @(Z) which
is contrary to the assumption that y m is the minimal point of @. Hence~m must be
the Nash play. The proof is complete.

3.5. CONVERGENCE THEOREM.


Instead of the reinforcement scheme (15) we present a more generalized version
which is called projectional algorithm (in contradistinction with the projection-
less version (15)). The advantage of the projectional algorithm is to get r i d of
the constraint (17) which has to be checked at each time instant. The algorithm is
given thus ,
i f y i ( t ) = Yki,
i then

Plki+u i (t+l) = ~S {Plki+ui(t) + y i ( t + l ) I F i ( o i ( y ~ i ' s i ( t + l ) ) i }


i
E (t+l)
(28)
pit+ll : ,i! t+l) IFicei(,i, si(t+llll
j e1(t+l) m - l
j # ki + ui ( i = l ....... N)
77

where ui is defined as in (16). ~SE denotes the projection operator into the sim-
• i " mi i = l}
plex SC = { l : pj > I , j~l Pj

Theorem 3.
The automata play y(t) converges in probability to the Nash play, i.e.
~(t) P > ~, i f the conditions,

yi(t) > 0 ~ y i ( t ) = ~, yi2(


t:1 t~1 t) < (29)

~i(t) > 0 Z y i ( t ) ~i(t) < ~, El(t) ~ 0


t=l t +~

(i =l ....... N)

hold for the reinforcement scheme (28).


For the proof of the above theorem we shall make use of the following theorem
on the convergence of almost super-martingales which has been proved by Robbins and
S i e g m u n d 13 .

Theorem 4.

Let (~, F, P) be, a probability space and FIEF2 ~ . . . . be a sequence of sub


~- fields o f F . Let Ut, Bt, Et and ~t' t = l , 2 . . . . . . be non negative F t - measu-
rable random variables such that

E(Ut+I/Ft) ~ (I + Bt) Ut + ~t - ~t' t = I, 2 . . . .

Then on the set {~ Bt < ~' ~ ~t < ~} Ut converges a.s. to a random variable and
~ < ~ a.s.
t

P~f_~f_.Th_~g[~m_3. mi
N
Let V(p(t)) = i~l j~l (Pl (t) - pji~,2
; (30)

In view of the reinforcement scheme (28), V can be rewritten as


N pi . i " i ~ 2
V(p(t))= i=lZ [RSc(t) ( kl+u I (t-l)+ yi(t) IFi(ei(y i , s l ) ) l ) - Pki+u i ]

i(t) i~ 2
Z ..~i+ i [~SE(t)(pji(t'l)- m l l Y I F i ( e i ( y ~ i , s i ) ) l ) - Pj ]
i=l J~K U
Using the contraction property of the projection operator, we get,
78

v(E(t)) N
mi •

): (p1((t_l)- p]*)2 + 2N yi2(t) IFi(~i(y
k1 i,si))
.
[2 +
i=l j=l " i=l
+ NS Z y_~)
i=l j#ki+u i (ml-l) 2
ik
IFi(~)i(y ,si))
i
12+
N ,
Z 2 yi(t)( Plki+ui(t i.~F 1.)iFi(ei(yli,sl)) [
l) pkl+u
i=1 k

S .Z.i+ 2
i=l J)eK U ml-I

= V(p(t-l)) + ~ mi Yi 2(t) IFi (Qi(y~i ,si))12 +


-- i=l mi- 1
+ Nz 2 mi yi(t) ( pi. i ( t - l ) - i ~• i) [ F1(O1(Yki
" " " ,si ))I
i=l mi- 1 kl+u Pkl+u
N 2 mi "*)
i=IZ mi- 1 yi(t) ~j=IZ (pji(t-l)- P].j IFi(ei(y i 'si))l
=0
Averaging both sides with respect to the random variable si and the randomindices
ki, ui for fixed ~(t-l), we obtain :

E V(~(t))Ip(t-l) < V(p(t-l)) +


N mi • •
+ z yi2(t)mi z E{IFi(ei(y~i,sl)),2/{kJ}}pil(t-l)..p~N(t-I )
i=l -l klk 2.... kN
N " ml~l . . . . .
+ i=IZ yi(t) ml-m
12
I [ ki=l(P~i+l(t-l)- Piki+l~)E{[F1(e1(yl,s1))]+} pi.(t_l)k
I +

+ (P~i (t-l)- piei ) E{[Fi(gi(yi,si)~+} pl i(t-l) +


m m
i
+ ki=2~(p~i_l(t_l)_ Pi~ki_l)E{[F1(e1(yl,sl))]-} pi.(t_l)k
1 +

(31)

where
+ =Ix, if x>0
Ix] (3z)
LO, if x-< 0

and [x] = I-x] + .


79

The third term in (31) is indeed equivalent to,

N y i ( t ) 2 mi Aai(y_)+ N y i ( t ) 2 mi aai(y)_
m "

yi(t)m12m_11aal(y)}
,

E{ i=] mI- l f:] m~.--'~-- _ i~l


~y1>O ~y1:0 ayI <0 (33)

which is negative for any ay # 0 and all Z~ Y, by virtue of condition I (cf. Sec.
3.4).
pi*
Note that E{AZ} = 0 i f and only i f = p i
for all i. Therefore, we can ( t _ l )
assign positive constants Bl, B2,..,B N such that the third term in (31) is
expressed as,
N
y i ( t ) Biii pi(t l) _ pi, )l (34)
i=l

Let us introduce the upper bounds,

kI ,k2, • ,kN
• °

max E{IFi(ei(yli,si))l~/{kJ}} : At< , , i=l,..,N. (35)


kl,k2,..,k N k

Hence inequality (31) defi,es the almost supermartingale,


E {V(p_(t+l)/p_(t)}<V(p__(t))+ N
z Ai yig(t) - Ns y i ( t ) B i ~1'(nit ) (36)
i=l i:l

Applying the aforementioned theorem of Robbins and Siegmund, we conclude that


V(~ (t)) converges a.s. to a random variable and

yi(t)llpi(t)_ pi I1 < ~ ' (37)


t i = ] .... , N

on the condition

S yi2(t) < ~ i: I ..... N (3B)


t
80

Noting that

Ilpi(t) - ~iml] < Ilpi(t) - ~(t)ll * II~(t)-pi~l

where ~ ( t ) is the point on SE(t) which is closest to pim, we write

Bill_p(t) __pi~ 11< BilI]~i(t) - ~ ( t ) I I + Bi c i ( t ) (39)

According to (37) and (39) the sequence {El(t)} must guarantee the convergence of
the series ¥ i ( t ) c i ( t ) , i.e.
om
r. y i ( t ) c i ( t ) <~ (40)
t=l

With due regard to (37), (39), (40) as well as the fact that { y i ( t ) } is divergent,

yi (t) :
t=l

We get p/(t) > Since ~_~ > _pim as t f ~ as a consequence of


the fact that

i (t) > 0 (41)


t ~

Then { p i ( t ) } converge in probability to {~'~},


~ i : 1. . . . . . . N. The proof is com-
plete.
8~

3.6. ENVIRONMENTMODEL.

As said before the environment is completely characterized by the probabilities


{~i( s i , Z)} of the outcomes s i ( t + l ) for every play z ( t ) . These probabilities f u l l y
specify the automata interactions.

In the following we present two different models of the environment, namely the
"pairwise comparison" and the "proportional u t i l i t y " .

3.6.1. Pairwise comparison.


Let the environment be constituted of ~ elements j = l . . . . . . ~. The j - th
element finds out the strategies y i and yk of two randomly chosen (with equal proba-
b i l i t i e s ) automaton ; the i - th and the k - th ( i , k = l . . . . . . N). The j - th
element then responds in a probabilistic manner to only one of the chosen pair of
automata : say with probability pJ(y~, yk) to the i - th and with probability
pJ(yk, y i ) = l - pJ(yl, yk) to the k - th.

We shall assume p j ( y i yk) = ~ ( p j ( y i yk) = I/2 + p ( p j ( y i , yk)) where


pj(yl, yk) is a certain u t i l i t y index for the j - th element, and u(x) is a mono-
tonically increasing odd function ~(+~) = -g(-~) = I/2.

The considered form for the probability of a response from an environment element
to an automaton agrees with the natural assumption that the probability increases
as the u t i l i t y of the element increases and vice versa. The dependence of the ele-
ments' u t i l i t y on the automata strategies y i , yk(i ' k = l . . . . . . N) sets the compe-
t i t i o n and consequently stimulates certain objectives for the automata. For this
model of pairwise comparison the competition comes from the fact that for any auto-
maton, say the i - th, another automaton e.g. the k - th, while seeking i t s own
goal, may minimize the u t i l i t y of an environment element response to the i - th
automaton. I t is therefore conspicuous that the u t i l i t y pJ has to be a function of
the difference between yi and yk. The sign of that difference determines on which
side the strategy yi is to be manipulated by the i - th automaton.

Since the probability of choosing two automata out of N ones equals 2/N(N-I), i t
is clear that the total probability of agreement between a j - th element and an
i - th automaton is equal to

N
""
pjl(y_) :IT~ 2 k=Zl ~/(pj(yi, yk)) (42)
k#i

Notice that
82

• ° N ,,

0 < pjl < I, Z pj1 = 1 (43)


i=l

Let the response function of the j - th element to the i - th automaton is given by

Si : i ~ i y i ) • yi ~
(44)
, otherwise

where m is some positive piecewise continuous function, and ~ is a certain threshold


value.
Let p~({yk}) denote the probability of no response, i.e. 0 < si < m(yi). Then
the probability of one response i.e. (~(yi) ~ si < 2~(yi)) by one element out of
is equal to

pi (~(yi) ~ si < 2~(yi)i{yk}) = p~({yk}) + max pJl i ({yk}) . sg (~_yi) (45)


l~Jl<~

where
I x~O

i
,

sg (x) = (46)
0 , x<O

The function sg (x) is introduced to respect the boundedness condition (44).

Also, the probability of two responses (i.e. 2~(yi) ~< si < 3m(yi)) by two elements
out of v is equal to :

i
p i ( ~ ( y i ) ,< si < 3~(yi)/{yk}) = po({yk}}+~l [p~11({yk}) +

max p321( yk ) sg(L.yl) (47)


1<j 2~
J2 ¢ Jl
Analogously, the probability of n responses (i.e. nm(yi) ~<si < (n+l)m(yi)) by n
elements out of ~ is equal to

pi(nm(yi ) < si < (n+l)~(yi)/ yk ) : p~({yk})+


(48)
l N~I pj£i yk j i
[ ( ) + max p n ({yk}) sg(~_yi)]
Z=l l.< Jn "< v
Jn# Jl'J2 . . . . Jn-l
83

The probability of no response p~({yk})~ is such as to f u l f i l the normalization condi-


tion

~i p i ( s i / z ) = l (49)
S

3.6.2. Proportional u t i l i t y .
in this model each element of the environment responds to the automata with pro-
b a b i l i t i e s proportionable to the u t i l i t i e s of t h e i r strategies. The probability of
a response from an element increases as the u t i l i t y of an automaton strategy increa-
ses and becomes maximum for maximum u t i l i t y . Hence the probability that the j - th
element responds to the i - th automaton can be expressed thus,

•. N
p31(y) = ~ ( p j ( y i ) ) / k~l ~(pj(yk)), (5o)

j = I, ........ v ; i = I, ...... , N

where 9(.) is a positive non-decreasing function, as described beforehand, and


p j ( y i ) is the u t i l i t y of the i - th automaton strategy yi for the j - th element.

Notice that the probabilities p j i ( y ) satisfy condition (43).

I f the u t i l i t y of an element decreases as yi increases, then the probability


(50) w i l l have the necessary property of contramonotonicity, i . e . i t decreases mono-
t o n i c a l l y with respect to the output y i of the i - th automaton, and increases mono-
t o n i c a l l y with respect to the set of other automata outputs yk(k ~ i ) .

Eqs. (44) - (48) again complete the mathematical description of this environment
model.

3.7. MARKETPRICE FORMATION.

Consider N sellers in a market trading in one specific commodity. Each i - th


seller (i = l . . . . . . N) is assumed to be supplied by a constant qi units of that
commodity per time increment (the interval between any two successive time steps).
The strategy of any i - th seller yi represents the price he specifies for his com-
modity. Let the i - th s e l l e r receives a demand si in monetary units for buying his
commodity at the specified price y i . The penalty index f o r the i - th s e l l e r is
simply the mis-match between the demand and supply in monetary units, i . e .

ei : si _ q i y i , (i : 1 . . . . . . N) (51)

The consequence of that mis-match may be interpreted d i f f e r e n t l y by the sellers ;


each according to his psychological type. That interpretation is embodied in the
84

weighting function F i ( . ) of an i - th s e l l e r which may be considered in the f o l l o -


wing form :

Fi(e i) = ai(exp (biO i) - I) + dio i,

(i = 1 ....... N) (52)

The constants ai , bi , and di simulate the psychological type of the i - th seller


as follows :

Cautions type : ai , bi < 0 , di = 0


Objective type : ai , bi = 0 , di > 0 (53)
Hazardous type : ai , bi > 0 , di = 0

The nonlinearity of the weighting function Fi for cautious or hazardous seller indi-
cates the lack of objectivity of such psychological types. Thus a hazardous type
overestimates the importance of the excess of buyers demand (oi>0) and underestimates
the importance of the shortage of buyers demand (ei<0). A cautious type overestimates
the importance of the shortage and underestimates the importance of the excess.
The objective of each seller automaton is to find a price strategy which ensures on
the average the ]east harmful situation (according to i t s psychology) created by the
mis-match between commodity supply and demand in monetary units. Hence, each seller
attempts to minimize the function (4) where the indicator function ~i is given by
eqn. (5).

The automata scheme (15) is considered to simulate the behavior of the sellers.

The buyers representing the s e l l e r ' s environment may be simulated by the "pair-
wise comparison" model of section 6.1. In this case the u t i l i t y of the j - th buyer
making his purshase from the i - th s e l l e r is given by

p j ( y i , yk) = ~(yk _ y i ) (54)

where a is some positive parameter.

The ~ function in eqn. (42) may be taken thus,

i , x>A
~(x) = (x+A)12~, -A < x < a (55)
, X < -A

Here (-A, A) represents the "active zone of the function". The function m(.) in eqn.
85

(44) may be considered as,

m(yi) = yi
(56)

and ~ the amount of money available to each buyer.

3.7.1. Experiment l (Pairwise Comparison).


Several simulation experiments were carried out on a digital computer. The fol-
lowing numerical values are considered in the simulations,
Number of sellers N= 3 , Sellers' psychology :
Number of buyers ~ = 12 , Cautious ai = -I.
bi = -0.005
Active zone A = lO , Objective di = 0.02
Hazardous ai = I.
bi = 0.005
Commodity supply ql = 2, q2 = 2, q3 = 3
Available money to each buyer ~ = 150
Buyers' u t i l i t y ~ = 0.05, see eqn. (54). The sequence y i ( t ) , see eqn. (15), was
taken as

i YO
y (t) = ~- , YO = cont., t = I, 2 . . . . . (i : I, 2, 3) (57)

The automata sheme (15) always converged to certain equilibrium price probabi-
l i t i e s independently of any i n i t i a l assumptions.
i
For objective sellers and the following set of prices Yk

i i~ 1 2 3

I00 I00 140


140 130 170
170 200

The equilibrium (average) price probabilities were found as follows


86

1 2 3
E{p~ (lO0)}
1 .0528 .OOl0 .6708

2 •9472 .4972 .3292

3 .5018 0

The probability of the f i r s t price for the t h i r d s e l l e r versus time is shown in


Fig.3.

Fig.3 demonstrates the influence of YO on the speed of convergence of the automata


reinforcement scheme. I t is concluded that a very small value of YO ( i . e .
0 < YO << l ) leads to a very sluggish convergence. On the other hand a value of YO
as big as l leads to a rather vigorous and o s c i l l a t o r y convergence. A suitable value
for YO was found to be somehow in between.

The effect on convergence rate that YO has can be deduced from the form of the
scheme (15). Note that the coefficients of Pki+ui(t), p (t) are all unity. Any con-
vergence will come from the "forcing terms" ; all of which are multiplied by
yi (t÷l).

As in any expedient scheme, the parameter ~0 contributes to determining the


"degree of expediency", and consequently the value of the limiting p r o b a b i l i t i e s ,
see e.g. Viswanathen and Narendra 14. I t is actually the ordering on the ensemble of
s t r a t e g i e s , rather than the precise magnitudes of the limiting probabilities which
matters here. This ordering is insensitive to the choice of ~0 provided that i t is
n e i t h e r too big to induce premature convergence nor too small to impede the l e a r -
ning e f f e c t of the f o r c i n g terms. This is demonstrated in Experiment 2,

Compared to the objective case the hazardous s e l l e r s tend at e q u i l i b r i u m to


increase the p r o b a b i l i t y of higher prices.

The e q u i l i b r i u m p r o b a b i l i t i e s are shown below :

E i l 2 3
{Pk (lO0) }
l .0457 0 .3265

2 .9543 .1639 .6735

3 .8361 0

For the case of cautious sellers the equilibrium price probabilities show the ten-
dency of increasing the probability of lower prices
:)3
1

I.

~o~'1

~o= .01
5 ¸

I I I I I I I
) I0 20 30 40 50 60 70

:ig.3 - T h i r d - s e l l e r ' s f i r s t price p r o b a b i l i t y versus time


88

E{p~ (I00)} 1 2 3

.3715 .0009 .9332

.6285 .7919 .0668

- .2072 0

The above r e s u l t has no analog i n the case o f d e t e r m i n i s t i c modeling , where


the psychology does not a f f e c t the e q u i l i b r i u m c o n d i t i o n s 15. This is due to the
f a c t t h a t in s t o c h a s t i c modeling the e x p e c t a t i o n o f the p e n a l t y f u n c t i o n being zero
does not imply t h a t the e x p e c t a t i o n o f the p e n a l t y index is zero due to the non-
l i n e a r form o f the p e n a l t y f u n c t i o n in the case o f hazardous and cautious types.
3 . 7 . 2 . Experiment 2 ( P r o p 0 r t i 0 n a l u t i l i t y ) .

Let us consider the f o l l o w i n g market c o n d i t i o n s :

- number o f s e l l e r s = 2
- number o f buyers : 3
- buyer's u t i l i t y pj(yi) = h - yi, see eqn. (50) where h is a "reference p r i c e " f o r
the buyer
- a v a i l a b l e amount o f money to each buyer ~ = 3
- r a t e o f commodity supplies ql = I , q2 = 2
- a c t i v e zone A = I , see eqn. (55)
- set o f p r i c e s , yl = { I ,
2} , y2 = { I , 2, 3}
- buyer's response f u n c t i o n m(y i ) = y i
- all s e l l e r s and buyers are o b j e c t i v e .

Under the above market c o n d i t i o n s i t i s s t r a i g h t f o r w a r d to t a b u l a t e the probabi-


lity o f the buyers' response (eqn. ( 4 2 ) ) , the average demand i = E{s i } =
~. s i p i ( s i / y ) , and the p r o f i t min ( i • q l y l ) as shown in the table below.
sI --
plJ(y) aver. demand Profit

yl y2 i = 1 i = 2 ~l(y) ~2(y) i = 1 i = 2

1 1 0,5 0,5 3 3 1 2
1 2 2/3 I/3 4 2 1 2
1 3 1 0 6 0 1 0
2 1 I/3 2/3 2 4 2 2
2 2 0,5 0,5 3 3 2 3
2 3 l 0 6 0 2 0
89

I t follows from the table above that the optimal prices to be adopted by the sellers
are ylX = 2, y2~ = 2. At these prices the profit of each seller is maximum.

We simulated the stochastic automata model (12)~m starting from the i n i t i a l state of
absolute randomeness, i.e.

l.

.8 2
P2

.6
( yo= 0.I )

.4

.2

0 i i

0 20 40 60

Fig. 4. Price probabilities - complete a priori information.

~m Notice that the indicator function 6i(y) is given by the difference between ave-
rage demand ~i and the supply q i y i , i.e. 6i(z) = ~i(z ) - qiyi
90

1.0
( Yo= 0,I )

.8

.6

,4

.2

t
I ! -I 1 I I

2O 40 60
Fig. 5, Price Probabilities - No a p r i o r i information.

The f i r s t automaton converged r a p i d l y to the optimal p r o b a b i l i t y ~I - = (0, I ) T,


see Fig.4. The second automaton has converged always to a dominant p r o b a b i l i t y f o r
the second p r i c e , see Fig. 4. The f i n a l p r o b a b i l i t y seems to be i n s e n s i t i v e to the
initial choice of y i .

For the case of no a p r i o r i information, the stochastic automaton (15) is emplo-


yed in the random environment of the buyers according to the p r o b a b i l i t i e s shown in
the above Table.

Again the f i r s t automaton has converged to the optimal p r o b a b i l i t y (0, I ) , see


Fig.5. The second automaton also converged to a p r o b a b i l i t y t h a t is dominant f o r
the second p r i c e , with a value less than in the case o f s u f f i c i e n t a p r i o r i i n f o r -
mation, see f i g . 5. The ordering of the p r o b a b i l i t y d i s t r i b u t i o n is i n s e n s i t i v e to
i
the i n i t i a l choice of y .

3.8. RESOURCEALLOCATION.

The problem of optimal allocation of a limited resource is formulated in the


following way : there is a resource of quantity R with N users of the resource ;
for each there is specified a function @1(s~), this being the effect achieved by the
i - th user of the resource i f he uses quantity si of i t . The effects achieved by
91

the d i f f e r e n t users are commensurable, i . e . , they are measured in homogeneous units.


I t is required to divide the existing resource among the users in such a way that
the total effect is maximized, i . e .

N ¢i " N . •
max S (s I) subject to z s1<R , silO (i=l . . . . N) (58)
s l s 2, . . . . s N i=] i=l

The known computational procedures for solving that problem are based on e i t h e r
dynamic programming 16 or gradient methods 17. Those methods assume p r i o r knowledge
of the functions @I(si). They lead to computational algorithms in the form of i t e r a -
tive procedures, where the constraint on available resource may be violated at in-
termediate computation steps. This makes direct on-line application of the computa-
tion results i n f e a s i b l e ,
The above methods are not suitable for real application due to several reasons
among them,

l - The functions ~(s i ) are often not known a p r i o r i neither to the user nor
to the allocation cneter. Moreover, the effects attained by d i f f e r e n t users can
vary unpredictably during the relevant time period due to random factors l i k e machi-
ne f a i l u r e , varying market prices, etc.

2 - The users are active systems which use the information about t h e i r produc-
tiveness to promote t h e i r own goals 18.

I t is of interest to explore the p o s s i b i l i t y of optimizing the system while i t


is in operation• That amounts to organizing collective behavior of the system, i . e .
establishing the rules of interaction between the center and the users, formulating
the c r i t e r i a of the users~specifying t h e i r control v a r i a b l e s , . . . e t c .

We model the collective behavior of the N users as a game between N learning


automata. The automata manipulate t h e i r control variable to promote t h e i r individual
goals. Knowing only the r e a l i z a t i o n : o f t h e i r "own" optimality c r i t e r i a , the automata
adapt control strategies as time unfolds. Successful adaptation must converge to an
almost Nash-play of automata where i t is to no user's advantage to change his con-
trol strategy. Organization of collective behavior is successful i f the Nash play of
automata corresponds to the desired optimality of the "over-all" resource-allocation
system.

The collective behavior game is considered as follows. I n i t i a l l y the d i f f e r e n t users


are given equal shares of the resource, i . e . si(O) = R/N. At the following planning
periods (between the time steps t = I , 2 . . . . . ) each i - th u n i t receives a new amount
of resource s i with a change Asi from the amount obtained at the preceding period.
Each i - th unit communicates to the center an estimate yi of i t s production effec-
tiveness at the current level of resource consumption si ,
92

@i'(si) ~ A@l Isi (59)


ASI

where A@I is the change in production corresponding to an infitesimal changeAsI of


the resource.
The penalty index ei of the i - th unit is considered to be the discrepancy
between the actual and estimated production effectiveness,

ei = @i'(si ) _ yi (60)

The penalty function of each i - th user is considered to be a positive function


IFi(ei)i. The penalty function IFi(.)i represents the law, set by the center to sti-
mulate objective data from the units about their production effectiveness18 . The
following are two possible examples of that law,

Fi(oi) : eilei I (61)

or
I~o i , ei < 0
Fi(ei) = ne l" , oi > 0 ~' q > 0 (62)

The nonlinear function F i ( . ) can also simulate psychological p e c u l i a r i t i e s of the


producers,see sec. 7. Malishevskii 8 presented the organization of behavior in con-
tinuous form as f o l l o w s ,

i - th user,

ddYit - { O,F(¢(Yi
i =(s@i
i ')(o))i
_yi
' (Fi<O),
otherwise(Yi=@i'(R)) (Fi>O) (63)

Center gi (yi)
si = R N • (i : l ....... N) (64)
gJ (yJ)
j=]

where the gJ(yJ) are positive, continuously increasing functions of yJ > O.


gJ(o) = 0 for all j. The law (64) meansthat the environment allocates the resource
on the elements in proportionality to their effectiveness. Malishevskii8 showed
9S

that the system (63), (64) is stable and all trajectories y i ( t ) converge to the
point,

#i : d@---~i I ~ i ( ~ ) (i = 1 N) (65)
ds I " , .... ,

provided that # are s t r i c t l y concave functions for all i = l . . . . , N and for all
0 ~ si 4 R. The above organization yields a suboptimal solution of the resource
allocation problem.
Optimality is approached when

yl ~ y2 ~ ~ N
....... = y = const = R

and the solution {si}, {yi} approaches the saddle point

~(~, x) < ~(~, x) < ~(s, ~)

where,
N N
@(s, X) = iZl¢i(s
i ) = + X(R- Z sJ)'
j=l

Here we consider the stochastic automata analog of Malishevskii's model. The varia-
bles @i, ¢i' for any si are considered to be stochastic variables with unknown
distributions. The estimation yi of the production effectiveness @i' is considered
to be the automaton's action or strategy. The learning of each i - th producer-auto-
maton is directed towards decreasing its own penalty,

J6i(z)l = E {lFi(@i'(s i) - y i ) l } (66)

where si is given by (64). Notice that the above formulation corresponds to a Nash
game : any realization of 6i satisfies the contramonotonicity property.
To show this, let y i ( l ) > yi(2), then

si(l ) _ gi(yi(1)! . gi(yi(2)) = si(2 )

According to the natural assumption concerning the diminution of production effecti-


veness with the increase of the used quantity of resources, we have ¢i'(si(1)) <
@i'(si(2)) whenever si(1) > si(2). Hence
94


@i'( i . (~i (yi(1) 1 ~,~ _ yi( l) @i' gi(yi(2) . . ) _ yi(2)
g (yl )+j~i~J(yU) < (gi(yl(2))÷j~i~J(y~)

and consequently

• "
FI(¢I'(.) _ yi(1)) < Fi (¢i' (.) _ yi(2))

Computer simulation were carried out for ten consumers with the following production
functions,

¢l(sl) = 4,25 ~n (I+s l) I@6(s6) = 2 sin s6


@2(s2) = 2,125 Cn (l+s 2) @7(s7) s 7 (2,125-s 7)
@3(s3) = ~2 s3 @8(s8) ½ s8(4, 25-s8)
@4(s4) = J ~ s 4 @9(s9) 2,25 (l-e -s9)
@5(s5) = ~
6 sin ~ s5 ¢lO(slO ) = 2,25 (l _e-2SlO)

¢i'
The set of trategies for each automaton taken as the values of at ten points
between 0 and l ; R is taken to equal one. The functions Fi(.) were taken as in (62)
with ~ = n = I. The functions gi(yi), see (64), were taken as

gi(yi) : (yi)r, (i : l . . . . . . . N) (67)

The sequences y ( t ) , c(t), see (28) were taken as Yo/t and co/t, respectively.
Convergence was always observed after short time interval. The algorithm (28) demons.
trated low sensitivity to the choice of the parameters YO' and c0. Fig. 6 depicts
the total production versus time. The results are improved when r gets larger, see
Fig. 6. This m~ans that i f the rule of distribution of the resource is close to the
rule "provide to him who gives the maximum estimate of effectiveness" the solution
approches the optimal one8. The organization however in the latter case seems to
be more sensitive to elements failure, see Fig. 6.

At time t ~, see Fig. 6 the f i r s t producer was considered to break down and start
to emit the estimate yl = O. That instantly caused the total production to drop off
drastically. Immediately then the system self-organized i t s e l f in such a way that
the resource was redistributed among the remaining producers so that the increase in
effects of the remaining producers partially compensated the drop in total produc-
tion. This result is interesting as i t demonstrates the r e l i a b i l i t y of the system.
- '-I ~ r ' ~ . . ,-i n ~ ,~ F'
'- - ~, 'l '---'- ;/ ,-~_, - - I o', i" / I - Cr-~-r-'"
I '~ -- I-I --"
3.5 l lj i I '~..I I...' I I r=4
I I I I
II I u I eqn.(67)
Ii i I o_
r= 2

3.0 Lr' I.I U-L,-rL_


I n
I'
I I
on
t~)
r~

2.~
!
I
I
I
I
I
I
I
J

2.0 , i t~
I I I

20 40 60 80
Fig. 6. Production versus time.
96

3.9. CONCLUSIONS.

A model of many goal-oriented stochastic automata is introduced for the analysis


of collective behavior of a class of problems of large systems, That class is cha-
racterized by the existence of a definite mutual opposition in the behavior of the
participants in the collective. In the model the goals of the participants are assu-
med to be known only up to certain indeterminate parameters for which there is no
a priori information available. Such class of problems cannot be dealt with by the
theory of N-person games. By means of that automata model the solution of such pro-
blems may be approximated which otherwise may be very d i f f i c u l t to get. Besides the
automata model can demonstrate the effect of certain interesting factors like parti-
cipants psychology, stimulation laws, behavioral tactics, etc . . . . on the modes of
collective behavior. The model is applied to study the process of market price
formation in a competitive economy, and also to the process of optimal allocation
of a unidimensional resource during system operation. Detailed simulation results
are also presented, which demonstrate the expediency of the automata behavior and
their learning capability.

REFERENCES.

I. Tsetlin, M.L., "On the behavior of Finite Automata in RandomMedia", Automation


and Remote Control, vol. 22, Oct. 1961, pp. 1210-1219.
2. Krylov, V. Yu., and Tsetlin M.L., "Games between automata", Automation and
Remote control, vol. 24, July 1962, pp. 889-899.
3. Varshavskii, V.I., and Vorontsova, I.P., "On the behavior of Stochastic Automata
with Variable Structure", Automation and Remote Control, vol. 24, March 1963,
pp. 327-333.
4. Narendra, K.S., and Thathachar, M.A.L., "Learning Automata : a survey", IEEE
Trans. Syst. Man, Cybern., vol. SMC-4, N°4, 1974, pp. 323-334.
5. Tenisberg, Yu. D., "Some Models of collective behavior in Dynamic Processes Of
Market Price Formation", Automation and Remote Control, n°7, 1969, pp. ]140-]]48.
6. Malishevskii, A.V., and Tenisberg, Yu.D., "One class of Games connected with
Models of collective behavior", Automation and Remote Control, n ° l l , 1969,
pp. 1828-1837.
7. Malishevskii, A.V., "Models of Joint Operation of many Goal-Oriented Elements,
] " , Automation and Remote Control, n ° l l , 1925-1845, (1971).
8. Malishevskii, A.V., "Models of joint operation of many Goal-oriented Elements,
I I " , Automation and Remote Control, n°12, 2020-2028, (1971).
9. Karlin, S., "Mathematical Methods in Games Theory, Programming and Economics",
vol. l , Addison-Wesley, Reading Massachusetts, 1959, pp. 303-348.
lO. Varshavskii V.I., Meleshina, M.V., and Perekrest V.T., "Use of a model of collec-
tive behavior in the problem of resources allocation", Automation and Remote
Control, n°7, ll07-1114, 1969.
I f . Burkov, V.N. and Opoitsev V.I., "A meta-game approach to the control of Hierar-
chical systems", Automation and Remote Control, n°l, 93-I03, (1973).
12. Loginov N.V., "Methods of stochastic approximation", Automation and Remote
control, 27, 4, 1966, pp. 706-728.
97

13. Robbins H., and Siegmund D., "A convergence theorem for nonnegative almost
supermartingales and some applications", in Optimizin 9 Met.hods in Statistics",
pp. 233-257. Academic Press, New York, 1971.
14. Viswanathan R., and Narendra K.S., "Comparison of Expedient and Optimal
Reinforcement Schemes for Learning Systems", Journal of Cybernetics, vol. 2,
n°l, 1972, pp. 21-37.
15. Krylatykh L.P., "On a Model of Collective Behavior", Engineering Cybernetics,
1972, pp. 803-808.
16. Bellman R., and Dreyfus S., Applied Dynamic Programmin9, Princeton : University
Press, 1962.
17. Arrow K., Hurwitz L. and Uzawa H., Studies in linear and nonlinear Programmin9,
Standford : California Stanford University Press, 1958.
18. Ivanovskii A.G., "Problems of stimulation and obtaining objective Estimates in
active systems", Automation and Remote Control, N°8, ]298-1303, 197D.
19. El Fattah Y.M., "Learning Automata as models of behavior", Simulation 75
Proceedings, Zurich (Switzerland) (]975).
20. El Fattah Y.M., "A model of many goal-oriented stochastic automata with applica-
tion to a marketing problem", 7th IFIP Conf. on Optimizatio.n Techniques procee-
dings, Nice (France) (1975).
2l. El Fattah Y.M., "Analysis of collective behavior in large systems using a model
of many goal-oriented stochastic automata with applications", IFAC Symp. on
large-scale systems proceedings, Udine (Italy) (1976).
22. El Fattah Y.M., and R. Henriksen, '~Simulation of market price formation as a
game between stochastic automata", J. Of Dynamic Systems, Measurement and
control (special issue) March (1976).
23. El Fattah Y.M., "Use of a learning automata model in resource allocation pro-
blems", IFAC.Symp., Cairo (Egypt) 1977.

APPENDIX

Projection operator

The N-dimensional simplex SE = {~ : i~l= yi = l , Yi > ~' -y~RN} can be transformed


into the simplex S = {~ : i~l xi L, xi > O, x~RN} by means of the simple change
of variables
Yi = xi + c , i = l ...... N (A.l)

We stipulate that
l (A.2)

in order that L = 1 - N~ be a positive number. The N-dimensional simplex has the


following for N ) 3 :
N vertices {T~}
N(N-I)/2 edges (two-dimensional faces) {T~}
98

N (N-l)dimensional faces {T~_I}


The vertex T~ is the point

T~ = (0,, 0 . . . . . . L, 0 . . . . . . O) (A.3)
-.~
J

and the face Tk (m > 2) is the subset


m

Tm
k = {x : xEDm, xi ) OVi} (A.4)

of one of the hyperplanes Dm :


N
i_Z_l ai xi = L (A.5)

where ai~{O, l } and


N
i~ l ai = m (A.6)

Obviously, for a particular m(l<m~N) there a r e ~ r eN~


alizations of such
hyperplanes.
The projection of x_~RN into the plane Dm(l<m~N) is accomplished according to
the fomula
N
L-i~ 1 aix i
(~(Dm)) j = (xj + m ) aj (j = l . . . . . N) (A.7)

In order to see that ~(Dm) lies on Dm, perform the summation of both sides of
(A.7) after premultiplying into aj to get

N = Z
!
N a~ xj + 3~l a~ (L- i 1m.ai x i ) = N aj xj + m(L-i~laix
= i)
j~l aj(~(Dm))j j=l "= 3 j~l m

--L

Here equation (A.6) is employed.


Let us now state the following lemma.

Lemma. The face Tm_1 closest to the point ~(Dm) has an orthogonal vector a_m_l with
the components

(am)k k ~j
(am_l) k = (A.8)
0 k =j
9g

where the index j corresponds to the minimal component of the point X(Dm~, that is

(~(Dm)) j = min (~(Dm~li (A.9)


i
~gg~. The distance V from a certain point zcDm to the vertex T~ is equal to
N N
V2(~,T~) = (zk _
L)2 + i~l zi2 = i~IZi
2+
(L2 _ 2ZkL!
iFk

Then
V2(~, T~) - V2(~, T~I) = 2(zm - Zk)L (A.lO)

i.e. the most distant vertex corresponds to the least component zk = man zm. Conse-
quently the face Tm_l which lies opposite to the vertex and closest to the point
has the orthogonal vector a_m_l obtained by nullifying the j - t h component of am,
see (A.8). The lemma has been proved.
By definition, the projection ~(~°) of a point ~° ~ RN into S is called the
point

R(~°) : {xm_ : I I ~ ° - _x~ll : minll~ ° - zll} (A.ll)


y__cS

I t follows then that finding x ~ = R(x°) is equivalent to finding the point on S


which is closest to the projection x o (m)
D of the point -x- ° into Dm(l<m<N) j provided
that x°(Dk) ~S V k>_m. Actually,

min ( ( y - x°II 2 = min { l ( y - x°(Dm)) + (x°(Dm) -_x°)((2


y_~S y~S

= fiX o ( Dm) - _x°II 2 + minlIy - x°(D_)ll


_ J)| 2 (A.12) (1)
y~S

The property (A.ll) of the projection operator as well as the Lemma suggest the
following sequential procedure for determining the projection,
a) check the condition x ° ~ S.
b~ i f x°Z S then find the projection x°(DN)
k "-
c) i f ~°(DN)~S then find the face TN_l closest to the point ~°(DN) from the Lemma,

(I) Notice that <x°-y, y> : 0 i f -y is the projection of x °. Also <x o(Dm)-Xo, xO(Dm)>
= O. Hence i f y is the projection of _x°(Dm) then <x°(Dm)-y, y> = 0 and conse-
quently <x_°(Dm) - x °, y - -x°(D
- - m
)> = 0
~OO

d) project _x°(DN) into DN_I-~ TkN_l. I f x°(DN_I)~S then we find the face T~_2 closest
to the point x°(DN_l) from the Lemmaand again project x°(DN_I) into DN_2:T~_2
and so forth t i l l xO(Dm)~S for certain m.
e} i f x°(D2)~S then the projection will be one of the vertices T~-

According to the above Drocedure the projection operator ~ is executed at N


steps at most.
£HAPTER IV

C 0 N T R 0 L - FINITE MARKOVCHAINS.

"Un jour tout sera bien, voiIA notre esp~rance,


tout est bien aujourd'hui, voilA l ' i l l u s i o n " .
Voltaire. Po~me sur le d~sastre de Lisbonne

4.1MARKOV DECISION MODEL.

A Markov decision model is f i r s t described. A system is observed at equally


spaced epochs numbered O, l , 2 . . . . . At each epoch n the system is observed to oc-
cupy one of N states numbered l through N. Each state i has associated with i t a
f i n i t e set Ki of Mi decisions. Whenever state i is observed some decision k in Ki
must be selected. Suppose state i is observed at epoch n and decision k is selected.
The probability that state j is observed at epoch n+l is ~ ( i , k, j ) . Transitions
are presumed to occur with probability l , so that

N
0 ~(i,k,j) and j~l ~ ( i , k , j ) = 1 , i=l . . . . . N, KcKi (I)

We assume a certain reward structure superimposed on the Markovian decision process.


Whenever the system is in state i and decision k~Ki is taken, then a reward r i k j
w i l l be earned upon transiting to a j - t h state.

A control policy D is a prescription for making decisions at each epoch. We


shall consider only the class of stationary memoryless policies. A policy in such
class is specified by a set of N vectors d~l)," d~2~"
" ..... d(N}."
" The vector d(i)"" is
Mi dimensional. Its k-th component d~i) specifies the probability for making the
k-th decision in Ki whenever observing the i - t h state, Naturally, d ( i ) must belong
to the simplex

sMi M xj ~ Mi
= {x__~--i : O, jZI= xj = l } , i = l ..... N (2)

The subclass of policies for which d~i) is zero except for exactly one kcKi , for
every i , is the class of deterministic policies.

We note that for any fixed D the observed states {Xn}n>0 constitute a homogeneous
Markov chain whose transition matrix P(D) = (Pij(D)) is giv-en by
102

Pij(D) = k~Ki ~ ( i , k , j ) d~i) , i,j = 1 ..... N (3)

For any fixed D the chain is assumed to be ergodic. That means that the chain is
characterized by one irreducible closed set of persistent aperiodic states. Tran-
sient states are allowed and can vary with the policy.
Definition. Let P be a stochastic matrix. The ergodic coefficient of P, denoted
by ~(P), is defined by
N
~(P) = 1 - sup
i,k j~l (Pij - Pkj ) + (4)
- p +
where (Pij kj ) = max (0, Pij " Pkj )"
A homogeneous Markov chain is ergodic i f and only i f ~(pk) > 0 for some k, cf.
Isaacson and MadsenI .

Under the ergodicity assumption there exists exactly one long-run state d i s t r i -
bution Pi(D), i = 1 . . . . . N, s a t i s f y i n g the conditions,
i . Pi(D) ~ 0 N
ii. pj(D) = i~l Pi(D)Pij (D)
N
iii. j~l pj(D) : 1
Let~T(D) be the 1 x N vector (Pl(D) . . . . . PN(D)) and ~ be the N x 1 vector of l ' s .
Then pT(D) is the solution of

~T(D)(I - P(D)) = O, ~T(D)Z : 1 (5)

which is a system of N+I equations in N unknowns. Since i t specifies ~(D) uniquely,


i t must contain exactly one redundancy. Since P is stochastic (by virtue of eqs.(1)
and (2)) the columns of (I-P(D)) sum to zero, i . e .

(I - P(D)) I = 0 (6)

So any of the columns of (I - P(D)) can be eliminated. Let B(D) be the NxN matrix
obtained by replacing the f i r s t column of (I - P(D)) by ~. Then eqs. (5) and (6)
combine to assure that ~(D) is the unique solution o f

2T(D) B(D) =~T (7)

T
where ~ i is the 1 x N vector ( I , 0 . . . . . 0). B is i n v e r t i b l e due to the fact that
i t is of rank N. Let Q(D) denote the inverse of B(D), I t follows from (7) that

J~T(D) = eT Q(D) (8)


103

That is ~T(D) is the top row of Q.

The assumption that the chain is ergodic for a l l D implies that the expected
reward does not depend on the i n i t i a l state i after a s u f f i c i e n t l y long interval of
time. The long-run expected average reward can be expressed as

N ~i)
@(D) = i~l k~K
~i d nik Pi(D) (9)
where, N
nik : j~l ~ ( i , k , j ) rik j , i : l . . . . . N ; k~Ki (lO)

is the expected average reward per stage when the state i is observed and the k-th
decision in Ki is taken. Without loss of generality we assume that a l l the q's are
nonnegative. They must be subject to a f i n i t e upper bound,

0 <__nik < c I < ~ all i , k (II)

The control problem is to find the optimal policy D which maximizes the expected
average reward ¢(D) subject to the system equation (7).

4.2 CONDITIONS OF OPTIMALITY.

A variational approach is adopted to find the conditions of optimality. An


admissible variation 6 B(A) of a policy D is defined by the system of N vectors,

a~B(~) : ~ = 1. . . . . N ; B : 1. . . . . M
, otherwise (.12)
i
where,
~)T = 1 .....
( - B-TT l 1~ ' 1 , - ~1 ..... - ~T ) (13)
p

B - 1 (~=I,...N
~=I,...M
C~

and ~ . denotes the Mi dimensional null vector. Here_~B(A) denotes the perturbation
of thelpolicy vector d ( 1 )
The variation "step-lenght'° 4, eqn. (12), must satisfy the condition that
d(~) + ~B(A) l i e in the simplex sM~. This amounts to

< min (I - d~~), ( M - I) min d~~ ) ) , A > 0


k#B (14)
IAI < min (d~~),-p ( M - I ) ( I - max d ~ ) ) ) , " A < 0
kfB
104

An admissible variation 6 B(A) of a policy D leads to a corresponding variation


6 6B of the matrix B(D) of eqn. (7). That variation is given by,

0 , i~m , all j
(15)
(6mBB)ij = 8~j &,i=m, j = l , . . °~
N

where,
O, j=l
Oj
~B = I -~(~,6,J) + k~B 1 ~((~,k,j), j=2, M
(16)

Using a matrix inversion lemma, cf. Ourand2, we can write the inverse of the matrix
B + a BB as

(B + a BB) - I = Q + ~ BQ (17)

where Q is the inverse of B and 6 BQ is the matrix

3
Qi~ ~B (18)
(8~BQ)iJ = I+ ~ ~ , i,j=l ..... N

and

N i
E]~BJ: iZ=l O~BQiJ ' 3=I. . . . . N (19)

Let us adopt the following definitions of the norm of a vector a_ and a matrix A,

II~II = m~ fail IIAII = max IAij ] (20)


i i,j

Since Q is the inverse matrix of B, i t follows from the Schwarz inequality that

l = llB.qll ~ IIBII.IIQII = llqll ~ co < ~ (21)

I t also follows from eqn. (19) that

II~BII ~ IIQII II~BII ~ c o max (max ~ ( e , k , j ) - m i n ~ ( ~ , k , j ) ) = c ( ~ ) ~ co


j k k (22)

The constants co and c(~) can always be chosen big enough to have inequality (22)
valid for all policies D.
Eqs. (8), (18) and (19) combine to assure that the variations 8 ~Pi(D) and 6 B~B(D),
105

i : l . . . . . N are given by,


i
~BPi = -4.p~ a
, i=l ..... N (23)
1 + 4.~B

i
Ci
, i:l ..... N (24)

Hence the variation in the expected reward, eqn. (9), resulting from a variation
a B(4) of a policy D, eqn. (12), can be written as
N M~
~a#(D) : z (SaBd~i) + d~i)
i=l k~I "nik'Pi "nik'~aBPi)
(25)
1 N ~i d~i) ~_i_BB )p
: 4(~ - ~ #B ~k - i ~ k:l ~i~ ~+4.~ ~

We define the derivatives,

a d~~)(D) = A~om
il a B¢(D)IA

N Mi i
l Z d~ i)
= (nab " ~( i k~B n~k
- Z
i=l k=l nik ~c~B)pc~ (26)

m:l . . . . . N ; B m K . The variation 6aB@, eqn.(25), can be expanded as a function of


4. Keeping only those terms with powers of 4 less than or equal to 2, we have

6 B@(D) - 8~@-~-T(D)
d~~j . 4
+ ~B(D) " 42 + °(43) (27)

where,
N Mi
~aB(D) = S Z d~i)
i=l k=l
i
nik ~B ~B P~
(28)

Let us now examine the effect of a variation 6y5 of a policy D on the f i r s t order
derivatives ~¢/8d~~) given by eqn.(26)o That variation can be written as,

N Mi dCi) i
@ = (nab _ 1 _ ~

N Mi
- S S d( i ) ~ i . 1 i
i=l k:l k "ik 6 y ~ B P~ - &(ny6 yi~ k~6qyk)~B P~
106

Considering the variations 5y6 of both sides of eqs.(8), (19), and making use of
eqn.(18), we get
C~
~'y6
(3o)
dy6 Po~ Y
Zl.py. 1 + A ~y(S

i
i = _ y ~ya
(31)

SuBstituting from (30) and (31) into (29) and then taking the limit of the division
by A as A+O, we get what we call the second-order derivatives,

~2 ¢ @@ _(na B 1 N Mi (i) i
Bd(Y)~d(~)
g : ~+0
lim ~ 6y6 @d-~B = - ]T~T k~B n~k - i :Zl k:l
Z dR qik~B)py~y6

(ny5 - ~ k~y k - i=IE k=1 "k~Y5 ) p ~


~,y = l . . . . . N ; 6~K , aeKy (32)

I t can be easily verified that any two arbitrary policies


d(~), d(~)' satisfy the equation,

_ _ B=I

where,
=~ - ) (34)
C~

Equation (33) means that one can construct a policy D' from another policy D by
means of successive admissible variations ~ B(A~)),'~ ~ =l . . . . . N ; 6cK . The varia-
tion in the expected average reward can be written as,

• ~a(1))) aft) +
+..+ a~ (D+611(a~l)) +... + alMl ' M1 . M1
(35)
+...+

, . ll(O) + A I)Z . iZ(o+fll(A l))) + ...


107

Since,
~2 ~ a~y)
@d~(D + y<~,Z (36)
~d~~# Y ,~ @d(Y)@d(~)(D) •
y<~
Y=~,~<B

we get upon substituting into (35),

@{D') - ~(D) : + @ ,(D)


@df~IBd(~;,.,, • . +
6 B
Y=~,~<B

(37)

THEOREMI. Necessary conditions for local optimality of a policy Dm are

i. @~B-~(Dm) : 0 al! ~ : d(~)~ c i n t sM~, all BeK (38)

ii. ~ ( D ' ) A~~) < 0 a l l ~ : d(~)m ~@SM, all BcK

Proof. Suppose that condition i of the theorem is not satisfied for some
: _~)~ ~ .int SM, say for ~ = ~. Consider a variation ~B(A )) of the policy D
such that A~~) is admissible. Since d(5)m is an interior point of the simplex

sM~ then A~~) can assume both positive and negative signs. Choose,

sign A~G)' = sign @---~(D~) (39)

Hence for sufficiently small A~~) we get

~B @(D~) > 0 (40)

which contradicts the assumption that D~ is optimal. Hence condition i must hold.
Now i f condition i i does not hold for some ~ : d(~) ~@SM~,say for ~, then again an
admissible variation 6~B(A~) ) will yield (40) which is in contradiction with the
assumption that Dm is optimal.

THEOREM2. Sufficient conditions for local optimality of a policy Dm are,


108

i. ~ ( D m) = 0 all ~ : d(~)m ~ int sM~, all B~K

ii. . < o all : (41)

iii. py ~y~ = 0 all ~ : d(~)~ ~BSM, all y : d(Y)~ ~sMy, all ~ K

iv. ~ ~ d~)~ ~ Y~ ~ =0 all y'd (Y)~ ~int sMy, all ~K


~:d(~)~ int sM~ k=l ~k ~ ~y~ < O or py ._ Y

Proof. I f the conditions of the theorem hold then i t follows from the expansion
formula (37) as well as the definitions (26), (28), and (32) that in the neighbor-
hood of D~,

¢(D) - @(D~() : c~:d(C~)~Z s M c ~ c~,B i : d ( i ) ~ t i n t SMi


alT geK -

• nik ~ ~6 p~ < O (42)

for all admissible ~ ) . Hence Dm is an optimal policy.

4.3 AUTOMATONCONTROLMODEL.

i . Complete A priori Information. Let us consider a control decision model in the


form of a stochastic automaton which experiments control policies while observing
the system's state at the successive epochs O, l , 2. . . . The automaton starts with
an arbitrary policy D ; for example the randomised policy generating equally proba-
ble decisions for each observed state (i.e. d~i)" = ~., all i , k). Let the present
l
epoch be n and the automaton observe a state ~ of the system. According to the policy
D(n) the automaton generates a control decision 8 with probability ~ a(~) (n) ; B~K.
The system then makes a transition to a j-th state at the next epoch, n+l, and the
automaton receives a reward c Bj . An "Adaptive Device", see Fig.l, changes the poli-
cy D of the stochastic automaton at n+l in order to improve the expected average
reward. The algorithm for changing the policy D, or what amounts to the same the
structure of the automaton, is called a reinforcement scheme• That scheme is consi-
dered to be as follows. I f at epoch n the observed state is ~ and the control deci-
sion is B then vary the policy at n+l such that

d(il(n+l) = d(il(n) + ~_~iB(A~m)(n)), i=l . . . . . N ; k=l . . . . . Mi


- (43)
A ~)(n) ~ ~
: yB(n) in) , y~(n) > o
109

subject to the bound (14).

~I A d a p t i v eDevice I__
-- i (Control policy Adaptation) I ~

stochastic Control Automaton I_


Control
I (n) I~ Observed
state
decision ~I~I Controlled Markov Chain I

Fig.l - Stochastic Control Automaton

i i . Lack of A priori Information. Let us consider the design of a control automa-


ton which has to make control decisions without a priori knowledge of the transi-
tion probabilities ~ ( i , k , j ) . In that situation the automaton has to estimate the
transition probabilities as time unfolds and simultaneously with decision making.
In such case the variation a~)(n)" in (43) is given by

A~)(n) = y ; ( n ) ~ ( n ) , y~(n) > 0 (44)

where a @/ad is the estimate of the gradient at epoch n.

R~m~. I f i t happens that for some state i , at epoch n, the component d~! ) of the
u

policy d[i}"" equals zero, and at the next recurrence of the state i the gradient
corresponding to a decision klk o is positive then the reinforcement
M~-I M.
scheme (43), or (44), w i l l be applied on the simplex S ' ~ S l obtained by drop-
ping o f f the k -th decision a l t e r n a t i v e . This is necessary to avoid premature con-
o
vergence to a non-optimal p o l i c y .

4.4 CONVERGENCE.

Let the present epoch be n. Let ~(n) denote the realization of the random varia-
bles : the decision probability vectors d ( i ) ( n ' ) , i=l . . . . ,N, and the observed state
xn, for n' = O,l . . . . . n. Consider the error criterion ; I(n) = (@(n) - @~)2 where,
@(n) is the expected average reward for policy D(n) = Dn and @~ is the optimal expec-
ted average reward. The expected value of I(n+l) conditioned on E(n) can be written
as,
110

E(l(n+l)/~(n)) = E((@(n) + ~@(n) - qb~)2/E in)) (45)


= (qb(n) - @~)2 + E((~q~(n))2/~(n)) + 2E((@(n) - dp~)~¢(n)/~(n))
= l(n)+ [!6Xna@(Dn,AaXn))2.d~xn)(n)+2 Sa(@(n)-q~')6XnadP(Dn,AaXn).d~xn)(n).

where xn, a denote the observed state and the control decision, respectively, at
epoch n. It follows from (25) and (26) that

6x aq~(Dn'z~Xn))= A~xn)" - - - ~ ( n ) + (A~xn))2 fxna(Dn,~Xn)) (46)


n ad~"n~

where,
(47)
" ~X~a(n) (n)
fXna(Dn, A~xn)) : ~ [ d~i)(n)'nik. ~IXna(n)-l+A!xn)~Xn (n) "Pxn
^na

and a"(Xn)
a is choosed according to the reinforcement scheme (44) as
rb

A~xn) = yXn(n) • B~a~(n) (48)

Substituting into (46) we get

6xna@(Dn,A~Xn)) = yXn(n)B~a~(n) -+- B- ~~n'


d(an )
(49)

+ yXn(n)~d~(n)fXna(Dn,yXn(n)~--~@(n)
a ada^nJ

Substituting by (49) into (45) we obtain,


rb rb

E(I(n+l)/~.(n)) : I(n)+ ~a(yXn)2 ( a ~ ( n ) ) 2 [~(n)+yaXn(n) a @ (n).


a a
ad(Xn)
a

f~v

fXna(Dn'yXn(n) ~@~a
(n))]
ad~n~ •
a
I ] (5o)

Here [. ] means the same term between brackets as in the second terms~of the same
equation. We impose the condition that the estimated derivatives @ ~ ( n ) , f o l l o -
wing an appropriate e s t i m a t i o n o f the t r a n s i t i o n p r o b a b i l i t i e s % ( i , k , j ) , satisfy
the c o n d i t i o n t h a t

where,

rn and sn are non-negative € ( n ) - measurable random v a r i a b l e s such t h a t

C sn < w a.s. ; rn,sn are u n i f o r m l y bounded. (53)


n

Let us examine the s i g n o f p , eqn. (52). S u b s t i t u t i n g from (47) we have,


'na

pxn(n) (54)

Using the boundedness conditions ( l l ) , (22) as w e l l as the d e f i n i t i o n o f the f i r s t -


order d e r i v a t i v e s (26) we get,

- min n )+c(xn)max n Cl+cocl = C2


a 'na a 'na xn,a 'na
(55)

+$n) > (min nxn,- max n )-c(xn)max nX a > -Cl-COC1 =-C2


ada n a a 'na xn,a n

Hence p (n) i s a u n i f o r m l y bounded sequence f o r which


'na 2
Cl Co

where 7 i s an upper-bound t o be imposed on the sequence y i n ( n ) . It f o l l o w s then


that p (n) i s a non-negative sequence i f
'na
112

< l (57)
- c2+clc~

Having guaranteed the non-negativeness of the variables Px a' eqn.(54), we can


rewrite eqn.(50) in the form of the inequality n

E(I(n+l)/~(n)) < I(n)+c ~ (y~n(n))2- E 2


a
y~n(n)(¢~-@(n))F(B--~C(n))2Pxna(n
L @d~^n~
)
+ r n - Sn] (58)

where E is a positive constant representing the uniform upper-bound,

_> 8 ~" ) 2 ~¢ Yanda_~fxna(Dn,~a n


.dnl,max
Xn ( + ) (59)

Hence l(n) is a non-negative almost supermartingale and we can apply the convergence
theorem of Robbins and Siegmund3. This yields the following result. Lim l(n) exists
n
and is f i n i t e and

<~ (60)
n:l a Pxn

on

~(y;n) 2 < ~ (61)

I f we further impose the conditions

, Z r (62)
n=1
then (60) combined with the fact that (~___~@)2 is a uniformly bounded positive
sequence ; cf.(55), and (56), imply that@da^nj Pxna

lim @(n) : @ w.p.l (63)


n

We summarize the above results in the following theorem.

THEOREM3. The reinforcement scheme (44) subject to the conditions (51), (53) as well
as
O~ y~n ~ ~ - 1 ~n(n) : ~ , ~
c2+clC-~o ' n~l Y n~l (Y n(n))2< ~ ' n~l ~ n(n)rn<
113

yields an expected average reward with probability one.

4.5 ACCELERATION,

The following idea, originally due to Kesten4, may be employed to accelerate the
convergence of the stochastic algorithm (44). When the policy D is far from the op-
timal there will be few changes of sign of successive values of the gradient
a¢/ad~); ~ = l . . . . . N, ~ = i . . . . . M~ Near the optimal, we would expect to cause oscil-
lation from one side of d~~)m to the other. This suggests using the number of sign
changes of successive values of @@/@d~ ~) to indicate whether the policy estimate
d~~) is near or far from d~~)m. To accelerate convergence, the quantity y~(n), see
(44), is no~ decreasing i f a@/~d~~) has the same sign as the respective preceding
value (i.e. for the same ~, B). To formalize this, we introduce the set of N vectors
Z(1) Z(N) I f at epoch n the event : state ~, decision B, is taken place then
, , , , o , __ •

the B-th component of the vector Z(~) will be defined as,

Z~)(n) : s i g n - ~ B (n), sign (0) : 0 (64)

We also introduce the set of N count vectors L(1) . . . . . . ~(N) which are initialized
as

L~i)(o) = O, j = i . . . . . Mi ; i : l . . . . . . N (65)

If at epoch n the event : state ~, decision B, is taken place then the B-th compo-
nent of the vector L(~) will be up-dated thus,

L~)(n) = L~)(n-l) + l (66)

yi = ( Y I " " '


The step-length vectors _ i i ), i = l '" .., N are f i r s t initialized as,
YM

i
yj(O) = Yo = const. > O, j = l . . . . . Mi, i =l . . . . . N
(67)

If at epoch n the event : state ~, decision B is taken place then the step-length
element y~(n) is defined as,

y~(n) =I Y~(A-I)
i f Y O' (L~)(R)< 2)U((L~a)(n) >- 2)~(Z~(~)(n)'Z~C~)(n-l)> O)
114

The sequence y~(n) must satisfy the condition that the respective policy incre-
ments ~ i ) ( n ) satisfy the constraint (14) in order to verify that d(i)(n+l) belongs
to the simplex. SM~. I f A~i)(n) is such that d(i)(n+l) does not belongs to the sim-
plex then y~(n) is divided by two. The process of division is repeated, i f neces-
sary, u n t ~•l ~
d (i) (n + I ) i•s found t o be i•n t h e s l•m p l e x SMi .

4.6 NUMERICAL EXAMPLE.

We consider the example of the "Taxicab operation" given by Howard5. The problem
consists of a taxicab driver whose t e r r i t o r y encompasses three towns A, B and C. I f
he is in town A, he has three alternatives :
l. He can cruise in the hope of picking up a passenger by being hailed.
2.He can drive to the nearest cab stand and wait in line.
3.He can pull over and wait for a radio call.
I f he is in town C, he has the same three alternatives, but i f he is in town B,
the last alternative is not present because there is no radio cab service in that
town. For a given town and given alternative, there is a probability that the next
t r i p will go to each of the towns A, B and C and a corresponding reward in monetary
units associated with each such t r i p . This reward represents the income from the
t r i p after all necessary expenses have been deducted. For example, in the case of
alternatives l and 2, the cost of cruising and of driving to the nearest stand must
be included in calculating the rewards. The probabilities of transition and the
rewards depend upon the alternative because different customer population will be
encountered under each alternative.

I f we identify being in towns A, B and C with states I , 2 and 3, respectively,


then we have Table I .

Table I . Data for Taxicab Problem.

State Alternative Probability Reward ExpectedImmediate


Reward

i k ~(i,k,j) rik j rik

1 1 I/2 I/4 I/4 I0 4 8 8


2 II16 314 3/16 8 2 4 2,75
3 I/4 I/8 5/8 4 6 4 4,25

2 l I/2 0 I/2 14 0 18 16
2 1/16 7/8 1/16 B 16 8 15

3 I I/4 I/4 I/2 lO 2 8 7


2 I/8 3/4 1/8 6 4 2 4
3 3/4 1/16 3/16 4 0 8 4,5
115

Simulation of the stochastic control algotithm (43) is carried out on a d i g i t a l


computer. Starting from a completely random policy ( i . e . a l l decisions are made with
equal probabilities) the policy converged w.p. l to the optimal one, see Fig,2.
The long-run expected average reward always increased at every epoch until i t rea-
ched the steady state optimal value, see Table 2.

Table 2.

epoch expected average reward


n ¢

0 9.179964
1 9.680186
2 12. 465918
3 12.650611
4 12. 660767
5 12.817069
lO 13.177875
20 13.241918
30 13.318928
40 13. 342069
50 13.344534
d~ i)

(i=2, k=2) (i:l, k:2)


1 ..-. . . . . . . . ,I
_l--
I
I i (i:3, k=2)
.8 i i
I i
f"l . . . . . . .

is
.6
!1
.L.J
i
.4 !P I ,I

.2

I I r
0 I I
0 I0 20 30 40

Fig.2 - Control policy versus time


117

4.7 CONCLUSIONS.

The control of a f i n i t e - s t a t e discrete-time Markov chain is considered. At each


epoch the state of the chain is observed and a control decision has to be taken.
Possible control decisions are f i n i t e for a l l states. Depending on the observed
state and the taken decision the chain makes transition to one of the alternative
states with certain probabilities. Depending on the state and the decision as well
as the effected transition a specific reward is obtained. The control objective is
to maximize the long-run expected average reward. Necessary and s u f f i c i e n t condi-
tions of optimality are established using a variational approach. That approach
allows to formulate a stochastic control algorithm which can be performed by a
stochastic automaton controller. The automaton chooses i t s actions (control deci-
sions) for each state with respective probabilities. I t has been proven that the au-
tomaton's decisions converge with p r o b a b i l i t y 1 to the optimal ones. A numerical
example is worked out on a d i g i t a l computer and the results are in agreement with
the theory. The algorithm is believed to be v e r s a t i l e as i t uses a l i t t l e computation
time and memory and enjoys a non-negative super-martingale property,

COMMENTS.

4.2 The idea of using a variational approach is inspired by the work of Lyubchik and
Poznyak6. They provided a sketchy formulation of the conditions of optimality for
the optimal control problem with inequality constaints. They did not, however, i n d i -
cate any concrete meanings of the "derivatives". Neither did they evaluate the natu-
re of t h e i r conditions ( s u f f i c i e n t , necessary, or both). Further work remains to be
done for the problem with constraints, which may be formulated as a stochastic pro-
gramming problem7. Theorems I , and 2 presented here are believed to be new. I t is
interesting to examine t h e i r relationship with the Howard's conditions 5 based on
the dynamic programming approach.

4.3 The presented convergence proof is new. Condition (51) for the case of lack of
a p r i o r i information is believed to be less stringent than the condition of minimum
contrast estimate, cf. Mandl8.
118

REFERENCES
I . D.L. Isaacson, and R.W. Madsen, Markov Chains. New York : John Wiley and Sons,
1976.
2. E. Durand, Solutions Num~riques des Equations Alg6briques, Tome I f . Paris : Masson
1961.
3. H. Robbins, and D. Siegmund, "A convergence Theorem for Non-negative Almost Super-
Martingales and Some Applications", in Optimization Methods in S t a t i s t i c s , ed.
by J.S. Rustagi. New York : Academic, 1971.
4. H. Kesten, "Accelerated Stochastic Approximation", Ann. Math. S t a t i s t i c s , 29,
l , pp. 41 - 59, 1958.
5. R.A. Howard, Dynamic Programming and Markoy Processes, New York : John Wiley and
Sons, 1962.
6. L.M. Lyubchik, and A.G. Poznyak, "Learning Automata in Stochastic Plant Control
Problems", Automation and Remote Control, N°6, pp. 777 - 789, 1974.
7. A.S. Poznyak, "Learning Automata in Stochastic Programming Problems", Automation
and Remote Control, N°IO, pp. 1608 - 1619, 1973.
8. P. Mandl, "Estimation and Contro| in Markov Chains", Adv. Appl. Prob. 6,
pp. 40 - 60, 1971.
119

EPILOGUE

In Chapter I we have discussed existing definitions of learning and delineated


new aspects of cybernetic modeling of what can be a learning or self-organizing
system. We highlighted the dynamics of real system-environment interactions during
a learning process. Further work is needed to carry over those ideas to the realm
of real self-organizing systems. That would need f i r s t to introduce the role of ener-
gy and matter into the information perspective and moreover find formulae for t h e i r
metabolism and transformation into one another. That would surpass the l i m i t s we
have deliberately settled for our quest in this monograph. Chapter I I has grown out
from a survey work of the basic techniques to pattern recognition systems. I t has
brought together a host of published results in a unified, systematic, and c r i t i c
way. Chapter I l l presents a complete theory of learning automata use to solve collec-
tive problems, which are formulated as a game between automata. I t has been shown
that under certain conditions the i n d i v i d u a l i s t i c goal-seeking behavior of the auto-
mata converges to a desired meta-objective set out by the environment. Further
research can prove to be f r u i t f u l to study situations where the environment is non-
stationary in contradistinction with the stationarity hypothesis considered in the
present work. Chapter IV presents a new algorithm for the control of f i n i t e Harkov
chains with unknown t r a n s i t i o n probabilities. A theory is given ensuring the conver-
gence w.P.l. Further research is to be developped to study cases where constraints
exist on the control and the state. Also to investigate the p o s s i b i l i t y of using a
team of learning automata to deal with chainswith a high number of states, a situa-
tion where the "curse of dimensionality" prevails.

Вам также может понравиться