Вы находитесь на странице: 1из 10

8

Competitive Learning
Neural Networks
Consider pattern recognition tasks that the following network can perform.

The network consists of an input layer of linear units. The output of each of these
units is given to all the units in the second layer (output layer) with adaptive (adjustable)
feedforward weights. The output functions of the units in the second layer are either linear
or nonlinear depending on the task for which the network is to be designed. The output of
each unit in the second layer is fed back to itself in a self-excitatory manner and to the
other units in the layer in an excitatory or inhibitory manner depending on the task.
Generally, the weights on the connections in the feedback layer are nonadaptive or
fixed. Such a combination of both feedforward and feedback connection layers results in
some kind of competition among the activations of the units in the output layer. Hence such
networks are called competitive learning neural networks.
Different choices of the output functions and interconnections in the feedback layer
of the network can be used to perform different pattern recognition tasks. For example, if
the output functions are linear, and the feedback connections are made in an on-centre off-
surround fashion, the network performs the task of storing an input pattern temporarily. In
an on-centre off-surround connection there is an excitatory connection to the same unit
and inhibitory connections to the other units in the layer.
On the other hand, if the output functions of the units in the feedback layer are made
nonlinear, with fixed weight on-centre off-surround connections, the network can be used
for pattern clustering.
The objective in pattern clustering is to group the given input patterns in an
unsupervised manner, and the group for a pattern is indicated by the output unit that has a
nonzero output at equilibrium. The network is called a pattern clustering network, and the
feedback layer is called a competitive layer. The unit that gives the nonzero output at
equilibrium is said to be the winner. Learning in a pattern clustering network involves

http://rajakishor.co.cc Page 2
adjustment of weights in the feedforward path so as to orient the weights (leading to the
winning unit) towards the input pattern.
If the output functions of the units in the feedback layer are nonlinear and the units
are connected in such a way that connections to the neighbouring units are all made
excitatory and to the farther units inhibitory, the network then can perform the task of
feature mapping. The resulting network is called a self-organization network. In the self-
organization, at equilibrium the output signals from the nearby units in the feedback layer
indicate the proximity of the corresponding input patterns in the feature space. A self-
organization network can be used to obtain mapping of features in the input patterns onto
a one-dimensional or a two-dimensional feature space.
Pattern Storage  Architecture Two layers (input and competitive), linear processing
(STM) units

 Learning No learning in FF stage, fixed weights in FB layer

 Recall Not relevant

 Limitation STM, no application, theoretical interest

 To overcome Nonlinear output function in FB stage, learning in FF


stage
Pattern Clustering  Architecture Two layers (input and competitive), nonlinear
(Grouping) processing units in the competitive layer

 Learning Only in FF stage, Competitive learning

 Recall Direct in FF stage, activation dynamics until stable


state is reached in FB layer

 Limitation Fixed (rigid) grouping of patterns

 To overcome Train neighbourhood units in competition layer


Feature Map  Architecture Self-organization network, two layers, nonlinear
processing units, excitatory neighbourhood units

 Learning Weights leading to the neighbourhood units in the


competitive layer

 Recall Apply input, determine winner

 Limitation Only visual features, not quantitative

 To overcome More complex architecture

http://rajakishor.co.cc Page 3
A nalysis of P attern C lustering N
etworks

A competitive learning network with nonlinear output functions for units in the
feedback layer can produce at equilibrium larger activation on a single unit and small
activations on other units. This behaviour leads to a winner-take all situation, where, when
the input pattern is removed, only one unit in the feedback layer will have nonzero
activation. That unit may be designated as the winner for the input pattern.
If the feedforward weights are suitably adjusted, each of the units in the feedback
layer can be made to win for a group of similar input patterns. The corresponding learning
is called competitive learning.
The units in the feedback layer have nonlinear f(x) = xn , n > 1 output functions.
Other nonlinear output functions such as hard-limiting threshold function or semilinear
sigmoid function can also be used. These units are connected among themselves with fixed
weights in an on-centre off-surround manner. Such networks are called competitive
learning networks. Since they are used for clustering or grouping of input patterns, they can
also be called pattern clustering networks.
In the pattern clustering task, the pattern classes are formed on unlabelled input
data, and hence the corresponding learning is unsupervised. In the competitive learning the
weights in the feedforward path are adjusted only after the winner unit in the feedback
layer is identified for a given input pattern.
There are three different methods of implementing the competitive learning as
illustrated below.

http://rajakishor.co.cc Page 4
In the figure, it is assumed that the input is a binary {0, 1} vector. The activaition of
the ith unit in the feedback layer for an input vector x = (x1, x2, …, xM)T is given by
M
yi   wij x j
j 1
where wij is the (i, j)th element of the weight matrix W, connecting the jth input to the ith
unit. let i = k be the unit in the feedback layer such that

yk  max( yi )
i
then

wkT x  wiT x, for all i


Assume that the weight vectors to all the units are normalized, i.e., || wi || = 1, for all
i. Geometrically, the above result means that the input vector x is closest to the weight
vector wk among all wi. That is
|| x – wk || ≤ || x - wi ||, for all i.
Start with some initial random values for the weights. The input vectors are applied
one by one in a random order. For each input the winner unit in the feedback layer is
identified, and the weights leading to the unit are adjusted in such a way that the weight
vector wk moves towards the input vector x by a small amount, determined by a learning
rate parameter .
A straight forward implementation of the weight adjustment is to make
wkj = (xj - wkj),
so that
wk(m+1) = wk(m) + wk(m)
= wk(m) + (x – wk(m))
This looks more like Hebb's learning with a decay term if the output of the winning
unit is assumed to be 1. It works best when the input vectors are not normalized. This is
called the standard competitive learning.

http://rajakishor.co.cc Page 5
If the input vectors are not normalized, then they are normalized in the weight
adjustment formula as follows:

 
wkj     wkj 
x j
 x 

 i
i


only for those j for which xj = 1. This can be called minimal learning.
In the case of binary input vectors, for the winner unit, only the weights which have
nonzero input would be adjusted. That is

 
x
wkj     wkj  , for x j  1
j
 x 
 i i

 0, for x j  o
In the minimal learning there is no automatic normalization of weights after each
adjustment.
M

That is, w
j 1
kj  1.

In order to overcome this problem, Malsburg suggested the following learning law,
in which all the weights leading to the winner unit will be adjusted:

 
wkj    j  wkj  , for all x j  1
x
 x 
 i i

In this law if a unit wins the competition, then each of its input lines gives up some
portion of its weights, and that weight is distributed equally among the active connections
for which Xj = 1.
The unit i with an initial weight vector wi far from any input vector, may never win
the competition. Since a unit will never learn unless it wins the competition, another
method called leaky learning law is proposed. In this case, the weights leading to the units
which do not win also are adjusted for each update as follows:

http://rajakishor.co.cc Page 6
 
w ij  w  j  wij  , for all j,
x
 x 
 m m

if i wins the competition, i.e., i = k.

 
 l  j  wij  , for all j,
x
 x 
 m m

if i loses the competition, i.e., i ≠ k.
where w and l are the learning rate parameters for the winning and losing units,
respectively (w > l). In this case the weights of the losing units are also slightly moved for
each presentation of an input.
Basic competitive learning and its variations are used for adaptive vector
quantization, in which the input vectors are grouped based on the Euclidean distance
between vectors.

A ssociative M emory

Pattern storage is an obvious pattern recognition task that one would like to realize
using an artificial neural network. This is a memory function, where the network is
expected to store the pattern information (not data) for later recall. The patterns to be
stored may be of spatial type or spatio-temporal (pattern sequence) type.
Typically, an artificial neural network behaves like an associative memory, in which
a pattern is associated with another pattern, or with itself. This is in contrast with the
random access memory which maps an address to a data. An artificial neural network can
also function as a content addressable memory where data is mapped onto an address.
The pattern information is stored in the weight matrix of a feedback neural network.
The stable states of the network represent the stored patterns, which can be recalled by
providing an external stimulus in the form of partial input.
If the weight matrix stores the given patterns, then the network becomes an
autoassociative memory. If the weight matrix stores the association between a pair of
patterns, the network becomes a bidirectional associative memory. This is called

http://rajakishor.co.cc Page 7
heteroassociation between the two patterns. If the weight matrix stores multiple
associations among several (> 2) patterns, then the network becomes a multidirectional
associative memory. If the weights store the associations between adjacent pairs of patterns
in a sequence of patterns, then the network is called a temporal associative memory.
Some desirable characteristics of associative memories are:
1. The network should have a large capacity, i.e., ability to store a large number of
patterns or pattern associations.
2. The network should be fault tolerant in the sense that damage to a few units or
connections should not affect the performance in recall significantly.
3. The network should be able to recall the stored pattern or the desired associated
pattern even if the input pattern is distorted or noisy.
4. The network performance as an associative memory should degrade only gracefully
due to damage to some units or connections, or due to noise or distortion in the
input.
5. The network should be flexible to accommodate new patterns or associations
(within the limits of its capacity) and to be able to eliminate unnecessary patterns or
associations.

Bidirectional Associative Memory (BAM)

The objective is to store a set of pattern pairs in such a way that any stored pattern
pair can be recalled by giving either of the patterns as input. The network is a two-layer
heteroassociative neural network as shown below.

The network encodes binary or bipolar pattern pairs (al, bl) using the Hebbian
learning. It can learn on-line and it operates in discrete time steps.

http://rajakishor.co.cc Page 8
The BAM weight matrix from the first layer to the second layer is given by
L
W   al blT
l 1
where al Є {-1, +1}M and bl Є {-1, +1}N for bipolar patterns, and L is the number of training
patterns.
For binary patterns pl Є {0, 1}M and ql Є {0, 1}N, the bipolar values ali = 2pli - 1 and
bli = 2qli – 1 corresponding to the binary elements p li and qli, respectively, are used in the
computation of the weight matrix.
The weight matrix from the second layer to the first layer is given by
L
W   bl alT
T

l 1
The activation equations for the bipolar case are as follows:

 1, if y j  0

b j (m  1)  b j (m), if y j  0
 1, if y  0
 j

where
M
y j   w ji ai (m)
i 1
and

 1, if x i  0

ai (m  1)  ai (m), if x i  0
 1, if x  0
 i

where
N
xi   wij b j (m)
j 1

http://rajakishor.co.cc Page 9
In the above equations a(m) = [a1(m), a2(m), …, aM(m)]T is the output of the first
layer at the mth iteration, and b(m) = [b1(m), b2(m), …, bN(m)]T is the output of the second
layer at the mth iteration.
The updates in the BAM are synchronous in the sense that the units in each layer are
updated simultaneously.
The BAM is limited to binary or bipolar valued pattern pairs. The upper limit on the
number (L) of pattern pairs that can be stored is min(M, N).
The performance of BAM depends on the nature of the pattern pairs and their
number. As the number of pattern pairs increases, the probability of error in recall will also
increase.

http://rajakishor.co.cc Page 10

Вам также может понравиться