Ann Systolic PDF

Chapter 11
Articial Neural Networks:

Algorithms and Hardware
Implementation
Andres PEREZ-URIBE1
Extracted from Draft Version 2.2
Bio-Inspired Computing Machines: Toward Novel Computational Architectures.
D. Mange and M. Tomassini (Eds.), PPUR Press, 1998, pp. 289-316
(http://ppur.ep .ch)
11.1 Introduction
Recent advances in technology have enabled us to use modern digital computers
capable of incredible feats. They operate at the nanosecond time scale and per-
form enormous and complex arithmetic calculations without error. Furthermore,
they are able to store huge amounts of data: documents, images, scientic data,
etc. Current microprocessor designs will soon surpass 10 million transistors [39],
a complexity hardly imagined in the early 1970s when the rst microprocessors
were designed (the Intel 4004 microprocessor contained 2,300 transistors).
Humans cannot approach these capabilities. They are not very good at com-
plex arithmetic calculations. It is very easy to make mistakes when dealing with
repetitive computations, and our memory seems quite limited for certain tasks.
However, humans routinely perform \simple" tasks such as walking, talking,
and common sense reasoning. The human brain interprets imprecise information
from the senses at an incredibly rapid rate. It records and recognizes sounds,
voices, images, faces with incredible performance, and generalizes experience with
astonishing facility. Despite their computational potential, such \simple" tasks
are very dicult for current computers.
1
Department of Informatics, University of Fribourg, Switzerland
2 Articial Neural Networks: Algorithms and Hardware Implementation
In fact, the brain is a remarkable computer, though the long course of evolu-
tion has given it very distinct characteristics than those of modern digital com-
puters, sometimes called von Neumann computers. The brain works at low speeds
(on the order of milliseconds) and low energy consumption. Its distributed com-
puting and memory dier from the centralized organization of typical computers;
the brain is very redundant and robust, though a computer can fail as a result of
a single damaged transistor (see Table 1).
Computer Biological
System
complex simple
processor high speed (ns) low speed (ms)
one/few a large number
localized distributed
memory noncontent content
addressable addressable
centralized distributed
computing sequential parallel
stored programs self-learning
reliability very vulnerable very robust
power
consumption high low
Table 1. Modern computer and human brain characteristics [20].
From the engineering point of view it is desirable to develop devices with such
characteristics, i.e., robustness, distributed memory and computation, generaliza-
tion, interpretation of imprecise and noisy information, etc. One possible way to
realize such devices is by means of biological inspiration , i.e., to mimic or imitate
nature, specically, the human brain in our case. In this chapter we describe the
basics of articial neural networks, massively parallel systems initially inspired
by biological nervous systems, how they learn (i.e., how they adapt and provide
generalization from examples or experience), and how they can be implemented
in hardware.
11.2 Biological neural networks

About 100 years ago, a Spanish histologist, Santiago Ramon y Cajal, the father
of modern brain science, realized that the brain was made up of discrete units
he called neurons, the Greek word for nerves. He described neurons as polarized
cells that receive signals via highly branched extensions, called dendrites and send
information along unbranched extensions, called axons [11] (see gure 11.1).
The brain is a very complex machine. The cerebral cortex in humans contains
approximately 1011 neurons, which is on the order of the number of stars in our
11.2 Biological neural networks 3
Dendrite Nucleus Axon

Action
Potential
Synapse
Figure 11.1: Biological neuron anatomy.
galaxy, and each neuron is connected to 103 to 104 other neurons. In total, the
human brain contains approximately 1014 to 1015 interconnections [20].
Although the brain exhibits a great diversity of neuron shapes, dendritic trees,
axon lengths, etc., all neurons seem to process information in much the same way.
Information is transmitted in the form of electrical impulses called action poten-
tials via the axons from other neuron cells. Such action potentials have an am-
plitude of about 100 millivolts, and a frequency of approximately 1 KHz. When
the action potential arrives at the axon terminal, the neuron releases chemical
neurotransmitters which mediate the interneuron communication at specialized
connections called synapses (see gure 11.2).
Synaptic
vesicle
Presynaptic
axon terminal
Postsynaptic
neuron Neurotransmitters
Figure 11.2: Synapses are specialized connections between neurons where chemical
neurotransmitters mediate the interneuron communication.
Such chemical substances, the neurotransmitters, bind to receptors in the
membrane of the post synaptic neuron, and excite or inhibit it. A neuron may
have thousands of synapses connecting it with thousands of other neurons. The
resulting eect of all excitations and inhibitions modies the external membrane's
electrical potential dierence of about -70 millivolts (the inner surface is nega-
tive relative to the outer surface). Therefore, the permeability to sodium (Na+ )
increases, leading to the movement of positively charged sodium from the extra-
cellular uid into the cell interior, the cytoplasm, which in turn may generate an
action potential in the post synaptic neuron.
This very brief description of the neuron's physiology has inspired engineers
and scientist to develop adaptive systems with learning capabilities. In the fol-
lowing section we will describe the main computational models that have been
developed so far, as a result of such biological inspiration.
11.3 Computational models

The history of neural networks can be considered to have begun with the work of
four researchers: Warren McCulloch, Walter Pitts, Donald O. Hebb, and Frank
Rosenblatt [4]. The rst two names are associated with pioneering work in the
1940s, where mathematical logic and neurophysiology were fused to propose a
binary threshold unit as a computational model for an articial neuron. Donald
O. Hebb, a Canadian psychologist, suggested a synaptic learning mechanism
in the late 1940s which has become the basic principle of almost all current
learning algorithms (see section 11.5.1). Finally, Frank Rosenblatt extended the
neuron's formal model of McCulloch and Pitts and developed a model called the
Perceptron.
11.3.1 The McCulloch-Pitts model

The computational model of the neuron presented by McCulloch and Pitts [27]
is binary and operates in discrete time. The output of a neuron is y(t) = 1 when
an action potential is generated, and y(t) = 0 otherwise. A weight value wi is
associated to each ith connection to the neuron. Such weights characterize the
synapses as excitatory if wi > 0, and inhibitory if wi < 0. A neuron \res" when
the eect of inhibitions and excitations is larger than a certain threshold (see
gure 11.3). The behavior of such a neuron is described as follows :
( P
y(t + 1) = 10 if
n w x (t) > ;
P i=1 i i (11.1)
if ni=1 wi xi (t) < ;
where xi (t) is the ith binary input (i = 1; 2; ::n) at time t, wi is the corresponding
weight value, also called the synaptic weight, and is the threshold. In the
following sections we omit the time parameter to simplify notation. McCulloch
and Pitts used wi = ,1 to designate a inhibitory connection whereas wi = +1
designates an excitatory connection.
11.3.2 The Perceptron

In 1958, the American psychologist Frank Rosenblatt proposed a computational
model of neurons he called the perceptron. The essential innovation was the intro-
duction of numerical interconnection weights instead of simple inhibitory/excita-
tory connections as in the McCulloch-Pitts model (see gure 11.4).
11.3 Computational models 5
x1 excitatory connection
x2
Θ y
xn inhibitory connection
Figure 11.3: McCulloch-Pitts neuron model.
The model is mathematically described as follows:

( P
y(x) = 1 if Pni=1 wi xi > ; (11.2)
0 if ni=1 wi xi < ;
where, y is the output of the perceptron, wi is the weight of input xi , and is
the threshold. The inputs (x1 ; x2 :::; xn ) and weights (w1 ; w2 :::; wn ) are typically
real values. If the presence of a value xi in a given input pattern tends to re the
perceptron, the corresponding weight wi will be positive. If the value xi inhibits
the perceptron, the weight wi will be negative. Such weights model synaptic
ecacity, and the output of such a neuron is its rate of re.
input
synaptic weight
x1 w1 output
w2
x2 1
Σ -1
y
wn θ
threshold
xn
Figure 11.4: Rosenblatt's perceptron model of a neuron.

The computational capabilities of this model were more deeply analyzed by
Minsky and Papert in their book Perceptrons [29]. We will brie y describe the
basics of the model in the next section.
Perceptron with adjustable threshold
The activation threshold can be implemented as an additional weight making it
furthermore adjustable.
A perceptron outputs y(x) = 1; if

Pn w x > , but, this is equivalent to :
i=1 i i
X
n X
n
y(x) = 1; if w0 + wi xi = wi xi > 0; (11.3)
i=1 i=0
where w0 = ,, and x0 = +1.
Perceptron with continuous activation function

In the mathematical description of a perceptron (see equation 11.2) it can be
seen that it rst computes a weighted sum of its inputs and then outputs one or
zero as a function of the weighted sum and the threshold. This can be better
expressed as shown in equation 11.4:
X
n
y(x) = g( wi xi ); (11.4)
i=0
where g(wi xi ) is called the activation function.

The use of continuous activation functions other than the simple threshold
function is a straightforward way of generalizing the perceptron model to real-
valued outputs. The most commonly used continuous functions are the linear, the
sigmoid, the Gaussian and their piecewise linear approximations (see gure 11.5).
y = mx
Linear function
y y
1
2
1 y = exp[- |x-c|2 ]
y= s
0.5 1+ exp(-(bx-c))
c x x
c
Sigmoid function Gaussian function
Figure 11.5: Typical continuous activation functions. Here, y(x) = g(P =0 w x ). n

i i i
In the following section we are going to examine the computational capabilities

of single neurons. We will see that single neurons are strongly limited. However,
networks of single neurons (i.e., neural networks) are capable of approximating
arbitrarily well any continuous function [25].
11.4 Computational capabilities 7
11.4 Computational capabilities

In the last section we presented the McCulloch-Pitts and the perceptron neuron
models. As a matter of fact, such models are a fusion between mathematical
logic and neurophysiology. This section is intended to present this issue in a
more detailed manner, and to present further explanation of the computational
capabilities of such models.
11.4.1 Threshold logic
The weights and threshold of a McCulloch Pitts neuron can be easily set to realize
the logic functions AND, OR and NOT as shown in gure 11.6. This discovery
was crucial in the emergence of the study of learning machines. However, it is
impossible to implement an exclusive-or (XOR) logic function, even if we use a
weighted threshold logic unit, i.e., a Perceptron [29].
x1
1
2 y
1
x2
AND
1
-1
1 0
OR NOT
Figure 11.6: Logic function implementation with threshold logic units.

To better understand the computational capabilities and limitations of a
threshold logic unit one usually interprets its behavior geometrically, as follows:
Consider the case where we have only two inputs x1 and x2 . Equation 11.3
becomes:
(
y(x) = 1 if w0 + w1 x1 + w2 x2 > 0; (11.5)
0 if w0 + w1 x1 + w2 x2 < 0;
The perceptron cannot decide whether to re or not when w0 + w1 x1 + w2 x2 =
0. That is, when
w w
x2 = , 1 x1 , 0 (11.6)
w2 w2
The resulting linear equation x2 = f (x1 ) is that of a line. Such a line is
called a decision surface because if an input lies on one side of it, the perceptron
res and if it lies on the other side it does not re. A decision surface linearly
separates the training inputs in the plane. In perceptrons with many inputs the
decision surface is a hyper-plane.
Figure 11.7 clearly shows how a linear separation in the two dimensional space
can implement the logic functions AND and OR.
X2 X2
1 1
X1 X1
0 1 0 1
AND OR
Figure 11.7: Linear separations of input space corresponding to logic functions.

In conclusion, what a threshold logic unit or a Perceptron does is to linearly
separate the plane or the hyper-space into two distinct classes by means of a line
or a hyper-plane.
Understanding this fact, we may easily conclude that it is impossible to draw
a line that divides the input points in a plane such that an exclusive OR logic
function (XOR) is implemented, and that a nonlinear separator or at least two
lines are needed to implement it (see gure 11.8)
x1 x2 y
0 0 0
:0
0 1 1 y { :1
1 0 1
1 1 0
XOR truth table
X2 X2
X1 X1
a) b)
Figure 11.8: XOR implementation with (a) a nonlinear separation of the input space,
and (b) two linear separations.
11.4 Computational capabilities 9
11.4.2 Multilayer perceptrons

A perceptron with n inputs can separate two classes of input vectors by tracing
a hyper-plane in the n-dimensional space of such input vectors. By arranging
several such perceptron units into layers and connecting the outputs of the per-
ceptrons of one layer with the inputs of the next layer we are able to build
multilayer perceptrons.
A multilayer perceptron, or MLP, like the one in gure 11.9 solves the XOR
problem by setting the two lines to x2 = x1 + 0:5 and x2 = x1 , 0:5 in the
plane X1 , X2 as shown in gure 11.8. Such an MLP is called a fully connected,
layered feedforward network since signals are fully connected between layers, and
no backward signals are transmitted. In this case, it has two layers, the output
layer and a so-called hidden layer.
Hidden layer
1
x1 0.5
1
-1
0.5 y
-1 1
1
x2 0.5
Figure 11.9: Multilayer perceptron.
Hidden units enable a network to form complex decision surfaces by devel-

oping internal representations of the inputs. Furthermore, perceptron units with
continuous activation functions can form smooth decision boundaries instead of
piecewise linear boundaries.
While the McCulloch-Pitts model no longer plays an important role in com-
putational neuroscience, it is still widely used in neural computation (i.e., the
technology based on networks of \neuron-like" units), especially when it is gen-
eralized to continuous inputs and outputs [4]. The computational capabilities
described so far lead us to conclude that threshold logic units and perceptrons
are computational units very well suited for classication tasks. This classication
is achieved by dening decision boundaries, which in turn are weight dependent.
How can we determine such weights in order to separate properly two or more
classes in the space of inputs ? This question directly leads us to the concept of
learning algorithms which will be explained in the following section. Now, what
if we introduce recurrent connections or connect neurons in dierent ways, rather
than in layers? In the following section we will give an overview of the most
commonly used neural network topologies.
a) single-layer perceptron d)Elman recurrent network
b) multi-layer perceptron e) competitive networks
c) Hopfield network f) self-organizing maps
Figure 11.10: Typical neural network topologies.
11.4.3 Neural network topologies

Feedforward networks share many properties with combinatorial logic, i.e., we
have seen how to implement logic gates, and how we can use networks of logic
gates to implement complex decision surfaces, just by connecting them into layers
(see gures 11.10a and 11.10b).
Similarly, recurrent networks, that is, neural networks with recurrent connec-
tivity behave like sequential logic; they have a sort of short-term memory and
previous experiences may have an eect on the activation of such units. The
Hopeld network [16] is a fully connected network introduced in the early 1980s
as a model of memory. This and other models exhibit a transient dynamic be-
havior, but eventually nd a stable state that permit them to store information.
Such networks can be used as associative memories which are content addressable,
i.e., a memorized pattern can be retrieved by specifying a part of it. More re-
cent recurrent neural network models include Elman networks (see gures 11.10c
and 11.10d), Jordan networks, etc [23], that are widely used when systems display
temporal variation and temporal context dependence.
11.5 Learning 11
Finally, the so called competitive neural networks and self-organizing maps

(see gures 11.10e and 11.10f) exhibit a special arrangement of self-excitatory
connections and lateral inhibitory connections that enable them to self-organize
and build input feature maps (see section 11.5.3).
11.5 Learning
Learning is often associated with increase of knowledge or understanding, and
it can be achieved by studying, instruction or experience. Although it is very
dicult to precisely dene learning, in the articial neural network context it is
closely related to proper adjustment of interconnection weights. We argue that
learning is also closely related to topology adaptation (see section 11.6), since
the behavior of an articial neural model is also very dependent on the network's
topology, i.e., the number of layers, the number of neurons, and the pattern of
connections.
A learning algorithm refers to a procedure in which learning rules are used for
adjusting the weights of an articial neural network, and possibly its topology.
11.5.1 Hebbian learning

Hebb's rule, proposed by Donald O. Hebb in his seminal work \The Organization
of Behavior" states that when two interconnected neurons re at the same time,
and repeatedly, the synapse's strength is increased.
Mathematically, this Hebbian learning rule can be expressed as :
wij (t)
i j yj (t)
x i(t)
wij (t + 1) = wij (t) + yj (t)xi (t); (11.7)
where xi (t) and yj (t) are the output values of neurons i and j at time t, wij (t) is
the current interconnection weight between neuron i and j , and is a parameter
called the learning rate. wij (t +1) is the future value of the synaptic weight being
updated during learning. One important characteristic of such a learning rule is
that it is done locally.
So far, Hebb's rule is the only biologically realistic learning rule, and it has
inspired a large number of learning algorithms which can be classied into three
main learning paradigms: supervised, unsupervised, and reinforcement learning.
11.5.2 Supervised learning

In supervised learning or learning with a \teacher", the training inputs are pro-
vided with the desired outputs. A basic principle of this kind of learning algo-
rithms is error correction, that is, an error value is generated from the actual
response of the network and the desired response, then, the weights are modied
such that the error is gradually reduced.
Perceptron learning algorithm. A very simple example of supervised
learning is the perceptron learning algorithm [35] due to Rosenblatt. See g-
ure 11.11.
1. Initialize weights and threshold randomly,

2. Present an input vector to the neuron,
3. Evaluate the output of the neuron,
4. Evaluate the error of the neuron and
update the weights according to:
wi (t + 1) = wi (t) + (d , y)xi ,
where, d is the desired output, y is the actual
output of the neuron, and (0 < < 1)
is a parameter called the step size.
5. Go to step 2 for a certain number of iterations, or until
the error is less than a prespecied value.
Figure 11.11: Perceptron learning algorithm.

Rosenblatt proved that a perceptron converges after a nite number of itera-
tions if the two classes of input patterns are linearly separable. This is known as
the perceptron convergence theorem [20]. An important property of the percep-
tron is that whatever it can compute, it can learn it with the perceptron learning
algorithm. However, if a given problem is not linearly separable, a perceptron
does not nd a low-error solution, even if it exists.
Least Mean Square and Madaline algorithms. The LMS (least mean
square) algorithm due to B. Widrow and M. Ho [43] was developed to provide
\learning" to a McCulloch-Pitts like element called ADALINE for ADAptive
LINear Element. Such a neuron model performs the weighted sum of its inputs,
and outputs f,1; +1g. It uses a hard-limiting function with a bias weight w0
controlling the threshold level of such a function. The learning rule tries to
reduce the mean square error (MSE), averaged over the training set, using the
gradient descent method as follows:
W(t + 1) = W(t) + (,rk ); (11.8)
where is a parameter that controls stability and rate of convergence, and rk is
11.5 Learning 13
the value of the gradient at a point in the MSE surface corresponding to W =

Wk .
In the LMS algorithm the weights are updated using an estimate of the steep-
est descent of the mean square error surface in weight space. This surface is a
quadratic function of the weights and is therefore convex and has a unique (global)
minimum [43].
Widrow extended his ADALINE units to multiple Adaline or Madaline net-
works and provided one of the rst trainable layered networks. A major extension
of such algorithms took place early in the 1970s when Paul J. Werbos developed
the backpropagation algorithm for training multilayer perceptrons.
Backpropagation algorithm. In 1974, Paul J. Werbos rst published the
backpropagation algorithm in his PhD thesis. He originally applied it to general
nonlinear systems, and not to articial neural networks [42]. The algorithm is by
far the most popular method for performing supervised learning. It can be used
with multilayer perceptrons, provided that the activation function of the units
is derivable. As a matter of fact, backpropagation is a procedure for eciently
calculating the derivatives of some output quantity of a nonlinear system, with
respect to all inputs and parameters of that system. Its name was rst coined by
Rosenblatt, and it is due to its procedure, which goes backwards from the outputs
to the inputs.
The most commonly used output quantity is an error function which typically
corresponds to the sum of squared error of the output neurons:
X
E = 1=2 (yj , dj )2 ; (11.9)
where dj is the desired jth output and yj is the current jth output of the network.
Backpropagation attempts to minimize the error E . The basic procedure is shown
in gure 11.12.
Applications. Supervised neural networks have been used in a wide range
of applications including signal and image processing, speech recognition, system
identication, automatic diagnosis, prediction of stock prices, signature authen-
tication, detection of events in high-energy physics, etc. Such problems basically
fall into three categories of applications: (1) classication and diagnosis, (2)
function approximation, compression, feature extraction, quantization, and, (3)
optimization. Real-world applications developed so far basically use multilayer
perceptrons (MLP) and Radial Basis Function (RBF) networks [26] (feedfor-
ward networks that use combinations of basis functions, typically Gaussians, to
approximate a given function) with the backpropagation learning algorithm, and
winner-take-all mechanisms with their Learning Vector Quantization (LVQ) al-
gorithms [21] (the supervised version of the well known self-organizing maps,
see section 11.5.3) [12]. However, a large number of other supervised learning
algorithms have also been developed, and present potential capabilities not yet
fully exploited.
1. Initialize weights randomly,

2. Present an input vector pattern to the network,
3. Evaluate the outputs of the network by propagating
signals forwards,
4. For all output neurons calculate j = (yj , dj ),
where dj is the desired output of neuron j and
yj is its current output: P
P
yj = g( i wij xi ) = (1 + e, w x ),1 ,
i ij i
assuming a sigmoid activation function,

5. For all other neurons
P (from last hidden layer to rst),
compute j = k wjk g0 (x)k ,
where k is the j of the succeeding layer,
and g0 (x) = yk (1 , yk ),
6. Update the weights according to:
wij (t + 1) = wij (t) , yi yj (1 , yj )j ,
where, is a parameter called the learning rate.
7. Go to step 2 for a certain number of iterations, or until
the error is less than a prespecied value.
Figure 11.12: Backpropagation learning algorithm.
11.5.3 Unsupervised learning

Unsupervised learning neural networks cluster, code, or categorize input data.
Similar inputs are classied as being in the same category, and should activate
the same output unit, which corresponds to a prototype of the category. Clusters
are determined by the network itself, based on correlations of the inputs [15].
Competitive learning. A basic principle of unsupervised learning is com-
petition: output units compete among themselves for activation. As a result only
one output neuron is activated at any given time. This is achieved by means of
a so called winner-take-all operation, which has been found biologically plausi-
ble [15]. The simplest competitive learning network consists of a single layer of
output neurons to which all inputs are connected. The winner-take-all operation
is implemented by connecting the outputs to the other neurons by means of so-
called lateral inhibitions, and also by means of self-excitatory connections. See
gure 11.13.
As a result of competition, only the unit i with the largest activation
Wi :X Wi:X, for all i
becomes the winner. In the case of normalized inputs, the unit i with the
smallest activation
jWi , Xj jWi , Xj, for all i,
11.5 Learning 15
x1 y1
x2 y2
x3 y3
Figure 11.13: Competitive learning network. The solid lines indicate excitatory con-
nections whereas the dashed lines indicate inhibitory connections.
that is, the unit with normalized weight closest to the input vector, becomes the
winner.
In fact, the winning neuron can be found by simple search of the maxi-
mum/minimum activation. This neuron updates its weight while the weights
of the other neurons remain unchanged. A simple competitive weight updating
rule is the following:
(
wij = (xj , wi j ) if i = i ; (11.10)
0 if i 6= i ;
where is a constant.
Self-Organizing Maps. The self-organizing map (SOM) is a special type
of competitive learning network where the neurons have a spatial arrangement,
i.e., the neurons are typically organized in a line or a plane (see gure 11.14).
A self-organizing map has the property of topology preservation, that is, nearby
input patterns should activate nearby output neurons on the map. A network
that performs such a mapping is called a feature map. It has been found that
such topological maps are present in the cortex of highly developed animal brains:
there is a two-dimensional retinotopic map from the retina to the visual cortex, a
one-dimensional tonotopic map from the ear to the auditory cortex, a somatosen-
sory map from the skin to the somatosensory cortex, etc [15]. Such topographic
maps seem to be created during the individual development in a sort of unsuper-
vised way.
An early model of self-organization was developed by Willshaw and von der
Malsburg in the mid 1970s [41]. They used excitatory lateral connections with
neighboring neurons, and inhibitory connections with distant neurons, instead of
using a strict winner-take-all mechanism, as shown in gure 11.15. The function
that denes the form of such lateral connection weights is known as the Mexican
hat.
Teuvo Kohonen [22] described a self-organizing map learning algorithm that
approximates the eect of a Mexican hat form of lateral connection weights by
x1
x2
x3
Figure 11.14: Self-Organizing Map.
Wij excitatory
connections
inhibitory
+ + connections
- - i-j
Figure 11.15: Mexican hat form of the lateral connection weights.
introducing prespecied neighborhoods for each output neuron. The neighbor-

hood can be square, rectangular, circular, etc. During training, all weight vectors
associated with the winner neuron and its neighbors are updated (see gure 11.16
for the learning algorithm in procedural form).
Principal Component Analysis networks. Principal Component Analy-
sis (PCA) are standard techniques for feature extraction and data compression.
Given a k-dimensional vector x, its PCA linear transform is an n-dimensional
vector y with n < k and minimal loss of information. Such transformation is
achieved by rotating the coordinate system such that the elements of x in the
new system become uncorrelated. E. Oja [30] demonstrated that an articial
neural network is able to compute in parallel, and on-line, the PCA transform.
The PCA network is a layer of parallel linear articial neurons. Each neuron
computes yi = wiT x, where i = 1; 2; :::N , x is a k-dimensional input vector, and
N (the number of neurons in the network) determines the number of components
the network will compute. Oja's algorithm is unsupervised, and it is very useful
for reducing the dimensionality of data with minimal loss of information. Weight
updating is done as follows:
11.5 Learning 17
1. Initialize weights randomly,

2. Present an input pattern (X) to the network,
3. Select the i j neuron (assuming a 2D spatial arrangement of
the units) with the minimum output:
jWij , Xj jWi , Xj, for all i and j ,
4. Update all weights
( according to:
Wij (t + 1) = W ij (t) + (t)[X(t) , Wij (t)] if (i; j ) 2 Ni j (t);
Wij (t) otherwise;
where Ni j (t) is the neighborhood of the unit (i ; j )
at time t, and (t) is the learning rate.
5. Decrease the learning rate (t) and shrink the neighborhood Ni j (t)
6. Go to step 2 until the change in weight values is less than
a prespecied value.
Figure 11.16: Kohonen's self-organizing map learning algorithm.
w = w + (x , w),
where, w is the weight vector, is the scalar product x:w, and is a learning
constant. Other PCA learning algorithms, include the Stochastic Gradient Ascent
algorithm, the Weighted Subspace algorithm, etc [30].
Adaptive Resonance Theory. The adaptive resonance theory (ART), de-
veloped by Carpenter and Grossberg [7] attempts to solve a major problem: how
can a learning system preserve what it has previously learned, while continuing
to incorporate new knowledge. This problem is known as the stability-plasticity
dilemma. In the case of simple competitive learning there is no guarantee of sta-
bility of the obtained classes; such classes may vary with the order of presentation
of the input patterns, and the category of a given input pattern may continue
to change endlessly. One way to avoid this problem is to gradually reduce the
learning rate, but this reduces the network plasticity.
The distinctive characteristic of the ART model is the use of a vigilance pa-
rameter, which is used to determine if an input is suciently similar to those
already learned, otherwise activating a new category until the memory capacity
is full. The ART architecture is designed to learn quickly and stably in real
time in response to a possibly non-stationary world with an unlimited number
of neurons until it utilizes the full memory capacity [7]. However, the quality of
categorization attained by the algorithm is critically dependent on the a-priori
static vigilance parameter.
Applications. Unsupervised neural networks have been widely used in clus-
tering tasks, feature extraction, data dimensionality reduction, data mining (data
organization for exploring and search), information extraction, density approxi-
mation, data compression, etc.

11.5.4 Reinforcement learning
Reinforcement learning is a synonym of learning by interaction. During learning,
the adaptive system tries some actions (i.e., output values) on its environment,
then, it is reinforced by receiving a scalar evaluation (the reward) of its actions.
The Reinforcement learning algorithms selectively retain the outputs that max-
imize the received reward over time. Reinforcement learning tasks are generally
treated in discrete time steps. At each time step, t, the learning system receives
some representation of the environment's state, it takes an action, and one step
later it receives a scalar reward and nds itself in a new state. The two basic
concepts behind reinforcement learning are trial and error search and delayed
reward.
One key aspect of reinforcement learning is a trade-o between exploitation
and exploration. To accumulate a lot of reward, the learning system must prefer
the best experienced actions, however, it has to try (to experience) new actions
in order to discover better action selections for the future. Reinforcement learn-
ing clearly diers from supervised learning methods where explicit input-output
examples are presented each time to the system during learning. In fact, it is de-
ned not by characterizing learning algorithms, but by characterizing a learning
problem [38].
A central idea of reinforcement learning is called temporal dierence (TD)
learning. TD methods are general learning algorithms for making long-term pre-
dictions about dynamical systems. They are based on estimating value functions,
functions of states (or action-states pairs) that estimate how good it is for the
learning system to be in a given state (or to take a certain action in a given state).
Such value actions guide the action selection mechanism, the policy to balance
exploration and exploitation, in order to maximize reward over time, and let the
learning system achieve its goal. Figure 11.17 species the TD(0) algorithm in
procedural form.
Reinforcement learning is also related to search methods like genetic algo-
rithms, genetic programming and simulated annealing (see Chapter Evolutionary
Algorithms), which may be used to solve reinforcement learning problems. Evo-
lutionary methods dier in that they do not use the fact that the policy they are
searching for is a function from states to actions. Furthermore, they do not take
into account the visited states and selected actions during an individual's lifetime
[38]. Recent works have shown neurobiological correlates to trial-and-error and,
prediction and reward [37] mechanisms, proving this approach to be biologically
plausible.
Applications. Reinforcement learning has been widely used in learning
games (backgammon, chess, checkers, etc), control problems, dynamic channel
allocation in cellular telephone systems, packet routing in dynamically chang-
ing networks, improving elevator performance, learning autonomous agents and
robots (see Chapter Learning Autonomous Agents), etc.
11.6 Neural network topology adaptation 19
1. Initialize the V (s) value functions arbitrarily,

2. Initialize the environment: set a state s,
3. Select an action following a certain policy,
4. Take action a; observe reward, r, and nd the next state s0
5. Update the estimate value V (s):
V (s) = V (s) + [r + V (s0 ) , V (s)]
where is the step size, and is a
discount reward factor,
6. Let s = s0,
7. Go to step 3 until the state s is a terminal state,
8. Repeat steps 2 to 7 for a certain number of episodes.
Figure 11.17: TD(0) algorithm for estimating value functions.
11.6 Neural network topology adaptation

Articial neural networks are specied by the network topology, neuron charac-
teristics, and training or learning algorithms. As we have seen in the last section,
most neural network models base their ability to adapt to problems on changing
the strength of their interconnections according to the learning algorithm. How-
ever, the lack of knowledge in determining the appropriate topology, including
the number of layers, the number of neurons per layer, and the interconnection
scheme, often sets an a-priori limit on performance. For example, if a network
is too small, it does not accurately approximate the desired input to output
mapping. If a network is too large it requires longer training times and may not
generalize well, that is, it may correctly map the training examples but give incor-
rect outputs at other points (previously unseen, but related to the same problem).
This problem is known as overtting. Figure 11.18 presents this phenomena: as
time passes, training error diminishes, whereas the test error increases [34].
ERROR
Test error
Training error
TRAINING TIME
Figure 11.18: Overtting or loss of generalization phenomena.

It is not obvious how to determine a good neural network topology for a

given task, i.e., how to successfully manage the bias-variance dilemma [36] (i.e.,
the tradeo between overtting and oversmoothing), and the stability-plasticity
dilemma (see section 11.5.3). Modular neural networks attempt to overcome this
problem by enabling us to connect dierent types of neural networks as modules
of a whole network.
A special class of learning algorithms has also been introduced to overcome
such problems, by oering the possibility of dynamically modifying the network's
topology, thus providing constructive incremental learning. Such networks have
been called ontogenic neural networks [10]. Basically, they make use of two mech-
anisms that may modify the topology of a neural network: growth and pruning .
The former, typically starts with small networks and adds units or connections to
the network to better approximate a given function. The latter, starts with large
networks and prunes or removes parts that are not needed, in order to shrink the
networks and avoid overtting. Ontogenic neural networks include supervised
growing algorithms like the cascade-correlation algorithm [9], the Upstart algo-
rithm [13], the Meiosis algorithm [5], unsupervised growing algorithms like ART
[7], and the Growing Cell Structures [14], pruning algorithms like weight decay,
Optimal Brain Damage and Optimal Brain Surgeon [5].
Evolutionary techniques (see Chapter Evolutionary Algorithms) like genetic
algorithms have also been widely used for the design of articial neural networks
[44]. Both the network weights and the topology can be coded into bit-strings,
which are then \evolved" with such algorithms.
11.7 Hardware implementation

Neuronal systems are dicult to simulate because they are composed of a large
number of nonlinear elements and have a wide range of time constants. The
mathematical behavior of such systems can rarely be solved analytically. A neu-
ral network model can basically be implemented or simulated in three main ways:
by programming a general-purpose computer, by programming a neural network
dedicated machine, i.e, a neurocomputer, or by building a special-purpose device
to implement a particular network and learning algorithm. A general-purpose
machine can simulate any neural network model and any learning algorithm,
though the speed of such simulations tends to be slow. Dedicated neural net-
work machines may work faster, but they are typically more dicult to program.
Special-purpose hardware provides enough performance for real-time interaction,
but they can demand signicant investment.
The rst attempts to build special hardware for articial neural networks date
back to the 1950s. In 1951 Marvin Minsky built the rst randomly wired neural
network learning machine (called SNARC, for Stochastic Neural-Analog Rein-
forcement Computer), simulating adaptive weights with potentiometers. In the
late 1950s Rosenblatt implemented the rst perceptrons using variable resistors
much in the same way Minsky did. In the early 1960s, Widrow and Ho used
11.7 Hardware implementation 21
W1
X1
W2
X2 Threshold y
+ device
Wn
Xn T
Figure 11.19: Hardware implementation of a McCulluch-Pitts neuron.
specially developed elements called memistors, electrically variable resistors, to

implement Adaline and Madaline networks [43]. In the early 1980s, just before
John Hopeld revived the interest in the analysis of neural networks, I. Alek-
sander implemented the WISARD (WIlkie, Stonham and Aleksander's Device)
device, a RAM-based network that generalizes in a way which is similar to a
threshold logic unit. Aleksander claims that a RAM device is like a neuron in
that given a set of inputs x1 to xn , and a desired output, it can learn such output
by setting the RAM into the writing mode [1]. Such neural model and others are
called weightless neural networks.
11.7.1 Analog implementations

Analog implementations of articial neural networks are extremely compact when
compared with digital implementations, they are inherently parallel and the
power consumption is very low. Nevertheless, there exists a controversy due
to some problems of precision and weight storage in neural computation. The
analog alternative has been particularly successful on the design of neuromorphic
systems like silicon retinas and cochleas [28], and signal processing systems [40].
11.7.2 Digital neurocomputers

Digital neurocomputers are basically multiprocessor systems. Although in many
cases custom processors have been designed for articial neural networks, they
usually take much longer to develop. Multiprocessor systems made of commercial
processors present the advantage of short design time, lower cost, and software
support. On the other hand, there is currently a continuous emergence of new
and faster algorithms. The neural network specic features of neurocomputers
may prevent the use of such algorithms. The result may be that users are forced
to run slow algorithms on fast hardware instead of using fast and new algorithms
with relatively slow hardware [19].
Two architectures dominate neuroprocessor designs: single instruction with
multiple data (SIMD) and systolic arrays. The former architecture executes the
same instruction in parallel but on dierent data. The latter, executes a step of
a calculation before passing the results to the next processor in a pipelined man-
ner. The rst systolic architecture for neural networks was the WARP machine
designed at Princeton. Figure 11.20 presents, as an example, the architecture
and the processing element of a systolic array system called MANTRA [18].
v v v v
I1 I2 I3 I4 NIN WGTIN NOUT
INSTIN Inst. INSTOUT
PE PE PE PE WOUT L EIN
1,1 1,2 1,3 1,4
h h LIN
I1 O1 LOUT
PE PE PE PE W0 W1
2,1 2,2 2,3 2,4 PS := PS + W . D
I h2 O h2
2
PS := PS + (D - W)
PS if PS ≤ D
PS :={
N max if PS > D
if PS ≥ D
PE PE PE PE PS :={ PS
N min if PS < D
3,1 3,2 3,3 3,4 D
W := W + PS . D
U
W := W + PS . (D - W)
h h
I3 O3
PE PE PE PE WIN
PS
EOUT
4,1 4,2 4,3 4,4 UIN

h h
I4 O4 UOUT
v v v v
O1 O2 O3 O4 SOUT WGTOUT SIN
Figure 11.20: Systolic architecture of the MANTRA machine and its processing element
[18]. Each processing element PE receives inputs in both horizontal and vertical directions
I , and executes a step of a calculation before passing the results to the next processor,
i
in a pipelined manner.
On the other hand, hardware accelerators for general-purpose machines seem
to be the only feasible way to provide competitive hardware systems for articial
neural networks in general, in terms of exibility, price and performance [18].
11.7.3 FPGA Implementation

Recent progress in semiconductor devices enables us to use programmable hard-
ware devices such as Field-Programmable Gate Arrays (FPGA) (see Chapter In-
troduction to Digital Systems) which allow users to recongure their internal cir-
cuit connections and node logic functionalities. Such digital devices aord rapid
prototyping and relatively inexpensive neural network implementations. Arti-
cial Neural Networks have already been implemented using eld programmable
devices, initially taking advantage of their potential for rapid prototyping [6] and
then of their recongurability [8]. However, such recongurability has not been
exploited to implement neural networks with in-circuit structure adaptation. In
the following, we brie y describe three dierent approaches to implement articial
neural networks with FPGAs:
1. An interesting approach is to use FPGA run-time reconguration. This
approach divides an algorithm into sequential stages; the FPGA device imple-
ments one stage at a time (i.e, a backpropagation learning algorithm may be
divided into a sequence of feedforward stages and a sequence of backpropagation
11.8 Acknowledgments 23
of the error stages). When the stage's computations are completed, the FPGAs
are recongured with the next stage. This process is repeated until the task is
completed. Since only part of the algorithm is implemented at each stage, less
hardware is need to implement it and the resulting hardware implementation is
cheaper or faster, since more hardware is available to improve performance of the
active stage [8].
2. One of the most space-consuming features in digital neural network imple-
mentations is the multiplication unit. Therefore, practical multiplications try to
reduce the number of multipliers and/or reduce the complexity of the multiplier.
One typical approach is to use time-division multiplexing (TDM) and a single
shared multiplier per neuron. Another approach is to use bit-serial stochastic
computing techniques [6]. Stochastic logic is a digital circuit which realizes
pseudo-analog operations (multiplications, additions, etc) using stochastically
coded pulse sequences. Typically, such technique use probabilistic bit-streams,
where the numeric value is proportional to the density of '1's in the bit-stream. A
multiplication of two independent stochastic pulses can be implemented by a sin-
gle two-input logic gate. Though additions and subtractions are more complex,
they may be implemented by special up/down counters. A sigmoid activation
function can be obtained with a xed threshold value.
In parallel stochastic neural networks, the technique used to generate noise
for the neurons can be crucial, because the noise sources must be uncorrelated,
though several techniques for generating pseudorandom bit-streams have already
been developed [17, 3]. In Ref.[6] a single layer of 12 inputs and 10 outputs of a
stochastic neural network was implemented using Xilinx 4003 FPGAs. Ref.[24]
presents an FPGA prototyping implementation of an on-chip backpropagation
algorithm using parallel stochastic bit-streams.
3. Digital neural networks with adaptable topologies have also been imple-
mented. The Flexible Adaptable-Size Topology neural network dubbed, FAST
[32], is an unsupervised learning neural network that dynamically changes its
size. It has been implemented using commercial FPGAs, and has been success-
fully used in pattern clustering tasks, and non-trivial control tasks. However,
this network still does not exploit FPGA partial recongurability, already pos-
sible with certain FPGA architectures. This feature should lead us to more
complex, adaptable-topology neural networks, and new hardware implementa-
tion approaches.
11.8 Acknowledgments
The author gratefully acknowledges the support of the Centre Suisse d'Electro-
nique et de Microtechnique CSEM, and the support of the Swiss Federal Institute
of Technology-Lausanne. He thanks each and every one of the members of the
Logic Systems Laboratory, especially E. Sanchez, D. Mange, M. Sipper, and M.
Tomassini. Special appreciation is also expressed to M. Sipper, J.-L. Beuchat,
and C.A. Pe~na for their careful reading of this manuscript.
Bibliography
[1] I. Aleksander and H. Morton. An Introduction to Neural Computing. Int.
Thomson Computer Press, 1995.
[2] A.E. Alpaydin. Neural Models of Incremental Supervised and Unsupervised
Learning. PhD thesis, Swiss Federal Institute of Technology, Lausanne, 1990.
These 863.
[3] J. Alspector, J. Gannett, S. Haber, M. Parker, and R. Chu. A vlsi-ecient
technique for generating multiple uncorrelated noise sources and its appli-
cation to stochastic neural networks. IEEE-trans on Circuits and Systems,
38(1):109{123, January 1991.
[4] M.A. Arbib. Part I: Background. In Michael A. Arbib, editor, Handbook of
Brain Theory and Neural Networks, page 11. MIT Press, 1995.
[5] T. Ash and G. Cottrell. Topology-modifying neural network algorithms. In
Michael A. Arbib, editor, Handbook of Brain Theory and Neural Networks,
pages 990{993. MIT Press, 1995.
[6] S.L. Bade and B.L. Hutchings. FPGA-based stochastic neural networks
implementation. In IEEE Workshop on FPGAs for Custom Computing Ma-
chines, Napa, April 1994.
[7] G. Carpenter and S. Grossberg. The ART of Adaptive Pattern Recognition
by a self-organizing neural network. IEEE Computer, pages 77{88, March
1988.
[8] J.G. Eldredge and B.L. Hutchings. Density enhancement of a neural network
using FPGAs and run-time reconguration. In IEEE Workshop on FPGAs
for Custom Computing Machines, Napa, April 1994.
[9] S.E. Fahlman and C. Lebiere. The Cascade-Correlation learning architecture.
In NIPS2, pages 524{532, 1990.
[10] E. Fiesler. Comparative bibliography of ontogenic neural networks. Pro-
ceedings of the International Conference on Articial Neural Networks
(ICANN'94), 1994.
26 BIBLIOGRAPHY
[11] G. Fischback. Mind and Brain. Scientic American, pages 24{33, September
1992.
[12] F. Fogelman-Soilie. Applications of neural networks. In Michael A. Arbib,
editor, Handbook of Brain Theory and Neural Networks, pages 94{98. MIT
Press, 1995.
[13] M. Frean. The Upstart algorithm: A method for constructing and training
feed-forward neural networks. Neural Computation, 2:198{209, 1990.
[14] B. Fritzke. Unsupervised ontogenic networks. In Handbook of Neural Com-
putation, pages C2.4:1{C2.4:16. Institute of Physics Publishing and Oxford
University Press, 1997.
[15] K. Hertz, A. Krogh, and R. Palmer. Introduction to the Theory of Neural
Computation, chapter 9. Addison-Wesley, Redwood City,CA, 1991.
[16] J.J. Hopeld. Neural network and physical systems with collective compu-
tational abilities. Proceedings of the National Academy of Science, 78(8),
1982.
[17] P.D. Hortensius, R.D. McLeod, and H.C. Card. Parallel random number
generation for VLSI systems using cellular automata. IEEE Transactions
on Computers, 38(10):1466{1473, October 1989.
[18] P. Ienne. Programmable VLSI Systolic Processors for Neural Network and
Matrix Computations. PhD thesis, Swiss Federal Institute of Technology,
Lausanne, 1996. These 1525.
[19] P. Ienne, T. Cornu, and G. Kuhn. Special-purpose digital hardware for neural
networks: An architectural survey. Journal of VLSI Signal Processing, pages
13(1):5{25, 1996.
[20] A. Jain, J. Mao, and K. Mohiuddin. Articial neural networks: A tutorial.
IEEE Computer, pages 31{44, March 1996.
[21] T. Kohonen. Radial basis function networks. In Michael A. Arbib, editor,
Handbook of Brain Theory and Neural Networks, pages 537{540. MIT Press,
1995.
[22] T. Kohonen. Self-Organizing Maps, volume 30. Springer Series in Informa-
tion Sciences, april 1995.
[23] J.F. Kolen. Exploring the Computational Capabilities of Recurrent Neural
Networks. PhD thesis, Ohio State University, 1994.
[24] K. Kollmann, K. Riemschneider, and H.C. Zeidler. On-chip Backpropagation
training using parallel stochastic bit streams. In Proceedings of the IEEE
International Conference on Microelectronics for Neural Networks and Fuzzy
Systems Microneuro'96, pages 149{156, 1996.
BIBLIOGRAPHY 27
[25] V. Kurkova. Kolmogorov's Theorem. In Michael A. Arbib, editor, Handbook

of Brain Theory and Neural Networks, pages 501{502. MIT Press, 1995.
[26] D. Lowe. Radial basis function networks. In Michael A. Arbib, editor,
Handbook of Brain Theory and Neural Networks, pages 779{782. MIT Press,
1995.
[27] W. McCulloch and W. Pitts. A logical calculus of the ideas immanent in
neural nets. Bulletin of Math. Biophys., pages 115{137, 5 (1943).
[28] C. Mead. Analog VLSI and Neural Sytems. Addison-Wesley, May 1989.
[29] M. Minsky and S. Papert. Perceptrons. MIT Press, 1969 also published in
MIT Press 1988.
[30] E. Oja. Principal component analysis. In Michael A. Arbib, editor, Handbook
of Brain Theory and Neural Networks, pages 753{756. MIT Press, 1995.
[31] A. Perez-Uribe and E. Sanchez. The FAST architecture: A neural network
with Flexible Adaptable-Size Topology. In Proceedings of the IEEE Interna-
tional Conference on Microelectronics for Neural Networks and Fuzzy Sys-
tems Microneuro'96, pages 337{340, Lausanne, Switzerland, February 1996.
[32] A. Perez-Uribe and E. Sanchez. FPGA Implementation of an Adaptable-Size
Neural Network. In Proceedings of the International Conference on Articial
Neural Networks ICANN96, pages 383{388, Springer Verlag, July 1996.
[33] A. Perez-Uribe and E. Sanchez. Structure-Adaptable Neurocontrollers: A
Hardware-Friendly Approach. In Roberto Moreno-Diaz Jose Mira and Joan
Cabestany, editors, Biological and Articial Computation: From Neuro-
science to technology, pages 1251{1259, Lecture Notes in Computer Science
1240, Springer Verlag, 1997.
[34] R. Reed. Pruning algorithms-a survey. IEEE Transactions on Neural Net-
works, 4(5):740{747, September 1993.
[35] F. Rosenblatt. Principles of Neurodynamics:Perceptrons and the theory of
brain mechanics. Spartan Books, Washington D.C., 1962.
[36] S. Schaal and C.G. Atkenson. Constructive incremental learning from only
local information. Neural Computation, (submitted).
[37] W. Schultz, P. Dayan, and P. Read Montague. A neural substrate of predic-
tion and reward. Science, pages 1593{1599, 14 march 1997.
[38] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction.
MIT Press, 1998.
[39] N. Tredennick. Microprocessor-based computers. IEEE Computer: 50 years
of computing, pages 27{37, October 1996.
28 BIBLIOGRAPHY
[40] E. Vittoz. Analog VLSI signal processing: Why, where and how? Journal
of VLSI Signal Processing, 8:27{44, July 1994.
[41] C. von der Malsburg. Self-organization of orientation sensitive cells in the
striate cortex. Kybernetik, 14:85{100, 1973.
[42] P.J. Werbos. The Roots of Backpropagation: From ordered derivatives to
Neural Neworks and Political Forecasting. John Wiley and Sons, New York,
1994.
[43] B. Widrow and M. Lehr. Perceptrons, Adalines, and Backpropagation. In
Michael A. Arbib, editor, Handbook of Brain Theory and Neural Networks,
pages 719{724. MIT Press, 1995.
[44] X. Yao. Evolutionary articial neural networks. International Journal of
Neural Systems, 4(3):203{222, 1993.
BIBLIOGRAPHY 29
Index
ADALINE, 12 Hebbian, 11
adaptive resonance theory, 17 incremental, 20
Aleksander, Igor, 21 Kohonen, 15
ANN, 1 perceptron, 12
ART, 17 reinforcement, 18
articial neural networks, 1 supervised, 12
associative memory, 10 temporal dierence, 18
trial and error, 18
backpropagation algorithm, 13 unsupervised, 14
bias-variance dilemma, 20 learning games, 18
biological inspiration, 2 learning vector quantization, 13
clustering, 14 least mean square, 12
competitive learning, 14 LMS, 12
LVQ, 13
data compression, 18
data mining, 17 Malsburg, von der, 15
delayed reward, 18 MANTRA machine, 22
dimensionality reduction, 17 McCulluch-Pitts, 4
mexican hat, 15
exploitation-exploration trade-o, 18 Minsky, Marvin, 5
MLP, 9
FAST neural network, 23 modular neural networks, 20
feature extraction, 13, 16 multilayer perceptrons, 9
feature maps, 15
FPGA neural networks, 1
neural network implementation, 22 analog implementation, 21
run-time reconguration, 22 applications, 13, 17, 18
function approximation, 13 digital implementation, 21
FPGA implementation, 22
generalization, 19 hardware implementation, 20
growing neural networks, 20 modular, 20
Hebb's rule, 11 ontogenic, 20
Hebb, Donald O., 11 recurrent, 10
Hopeld networks, 10 stochastic, 23
topology adaptation, 19
Kohonen learning algorithm, 15 weightless, 21
neurocomputers, 21
learning, 11 neuron
algorithms, 11 biological, 2
competitive, 14 McCulluch-Pitts, 4
constructive, 20 neurotransmitters, 3
INDEX 31
Oja's algorithm, 16
ontogenic neural networks, 20
overtting, 19
PCA networks, 16
perceptron, 4
perceptron learning algorithm, 12
principal component analysis, 16
pruning, 20
radial basis functions, 13
Ramon y Cajal, Santiago, 2
RBF, 13
recurrent neural networks, 10
reinforcement learning, 18
Rosenblatt, Frank, 4
self-organizing maps, 15
SIMD architecture, 21
SOM, 15
stability-plasticity dilemma, 17
stochastic logic, 23
supervised learning, 12
synapse, 3
systolic array, 21
TD(0) algorithm, 18
temporal dierence learning, 18
threshold logic, 7
trial and error learning, 18
universal function approximation, 6
unsupervised learning, 14
weightless neural networks, 21
Werbos, Paul J., 13
Widrow-Ho rule, 12
winner take all, 14
XOR problem, 9

Ann Systolic PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ann Systolic PDF

Загружено:

Авторское право:

Доступные форматы

Chapter 11

Arti cial Neural Networks:

11.2 Biological neural networks

Dendrite Nucleus Axon

Figure 11.1: Biological neuron anatomy.

11.3 Computational models

11.3.1 The McCulloch-Pitts model

11.3.2 The Perceptron

Figure 11.3: McCulloch-Pitts neuron model.

The model is mathematically described as follows:

Figure 11.4: Rosenblatt's perceptron model of a neuron.

A perceptron outputs y(x) = 1; if

where w0 = ,, and x0 = +1.

Perceptron with continuous activation function

where g(wi xi ) is called the activation function.

Sigmoid function Gaussian function

Figure 11.5: Typical continuous activation functions. Here, y(x) = g(P =0 w x ). n

In the following section we are going to examine the computational capabilities

11.4 Computational capabilities

Figure 11.6: Logic function implementation with threshold logic units.

Figure 11.7: Linear separations of input space corresponding to logic functions.

11.4.2 Multilayer perceptrons

Figure 11.9: Multilayer perceptron.

Hidden units enable a network to form complex decision surfaces by devel-

a) single-layer perceptron d)Elman recurrent network

b) multi-layer perceptron e) competitive networks

c) Hopfield network f) self-organizing maps

Figure 11.10: Typical neural network topologies.

11.4.3 Neural network topologies

Finally, the so called competitive neural networks and self-organizing maps

11.5.1 Hebbian learning

wij (t + 1) = wij (t) + yj (t)xi (t); (11.7)

11.5.2 Supervised learning

1. Initialize weights and threshold randomly,

Figure 11.11: Perceptron learning algorithm.

the value of the gradient at a point in the MSE surface corresponding to W =

1. Initialize weights randomly,

assuming a sigmoid activation function,

Figure 11.12: Backpropagation learning algorithm.

11.5.3 Unsupervised learning

Figure 11.14: Self-Organizing Map.

Figure 11.15: Mexican hat form of the lateral connection weights.

introducing prespeci ed neighborhoods for each output neuron. The neighbor-

1. Initialize weights randomly,

Figure 11.16: Kohonen's self-organizing map learning algorithm.

mation, data compression, etc.

1. Initialize the V (s) value functions arbitrarily,

Figure 11.17: TD(0) algorithm for estimating value functions.

11.6 Neural network topology adaptation

Figure 11.18: Over tting or loss of generalization phenomena.

It is not obvious how to determine a good neural network topology for a

11.7 Hardware implementation

Figure 11.19: Hardware implementation of a McCulluch-Pitts neuron.

specially developed elements called memistors, electrically variable resistors, to

11.7.1 Analog implementations

11.7.2 Digital neurocomputers

INSTIN Inst. INSTOUT

2,1 2,2 2,3 2,4 PS := PS + W . D

4,1 4,2 4,3 4,4 UIN

11.7.3 FPGA Implementation

[25] V. Kurkova. Kolmogorov's Theorem. In Michael A. Arbib, editor, Handbook

Вам также может понравиться

Articial Neural Networks:

where w0 = ,, and x0 = +1.

wij (t + 1) = wij (t) + yj (t)xi (t); (11.7)

introducing prespecied neighborhoods for each output neuron. The neighbor-

Figure 11.18: Overtting or loss of generalization phenomena.

[25] V. Kurkova. Kolmogorov's Theorem. In Michael A. Arbib, editor, Handbook