Академический Документы
Профессиональный Документы
Культура Документы
11.1 Introduction
Recent advances in technology have enabled us to use modern digital computers
capable of incredible feats. They operate at the nanosecond time scale and per-
form enormous and complex arithmetic calculations without error. Furthermore,
they are able to store huge amounts of data: documents, images, scientic data,
etc. Current microprocessor designs will soon surpass 10 million transistors [39],
a complexity hardly imagined in the early 1970s when the rst microprocessors
were designed (the Intel 4004 microprocessor contained 2,300 transistors).
Humans cannot approach these capabilities. They are not very good at com-
plex arithmetic calculations. It is very easy to make mistakes when dealing with
repetitive computations, and our memory seems quite limited for certain tasks.
However, humans routinely perform \simple" tasks such as walking, talking,
and common sense reasoning. The human brain interprets imprecise information
from the senses at an incredibly rapid rate. It records and recognizes sounds,
voices, images, faces with incredible performance, and generalizes experience with
astonishing facility. Despite their computational potential, such \simple" tasks
are very dicult for current computers.
1
Department of Informatics, University of Fribourg, Switzerland
2 Articial Neural Networks: Algorithms and Hardware Implementation
In fact, the brain is a remarkable computer, though the long course of evolu-
tion has given it very distinct characteristics than those of modern digital com-
puters, sometimes called von Neumann computers. The brain works at low speeds
(on the order of milliseconds) and low energy consumption. Its distributed com-
puting and memory dier from the centralized organization of typical computers;
the brain is very redundant and robust, though a computer can fail as a result of
a single damaged transistor (see Table 1).
Computer Biological
System
complex simple
processor high speed (ns) low speed (ms)
one/few a large number
localized distributed
memory noncontent content
addressable addressable
centralized distributed
computing sequential parallel
stored programs self-learning
reliability very vulnerable very robust
power
consumption high low
Table 1. Modern computer and human brain characteristics [20].
From the engineering point of view it is desirable to develop devices with such
characteristics, i.e., robustness, distributed memory and computation, generaliza-
tion, interpretation of imprecise and noisy information, etc. One possible way to
realize such devices is by means of biological inspiration , i.e., to mimic or imitate
nature, specically, the human brain in our case. In this chapter we describe the
basics of articial neural networks, massively parallel systems initially inspired
by biological nervous systems, how they learn (i.e., how they adapt and provide
generalization from examples or experience), and how they can be implemented
in hardware.
Synapse
galaxy, and each neuron is connected to 103 to 104 other neurons. In total, the
human brain contains approximately 1014 to 1015 interconnections [20].
Although the brain exhibits a great diversity of neuron shapes, dendritic trees,
axon lengths, etc., all neurons seem to process information in much the same way.
Information is transmitted in the form of electrical impulses called action poten-
tials via the axons from other neuron cells. Such action potentials have an am-
plitude of about 100 millivolts, and a frequency of approximately 1 KHz. When
the action potential arrives at the axon terminal, the neuron releases chemical
neurotransmitters which mediate the interneuron communication at specialized
connections called synapses (see gure 11.2).
Synaptic
vesicle
Presynaptic
axon terminal
Postsynaptic
neuron Neurotransmitters
Figure 11.2: Synapses are specialized connections between neurons where chemical
neurotransmitters mediate the interneuron communication.
Such chemical substances, the neurotransmitters, bind to receptors in the
membrane of the post synaptic neuron, and excite or inhibit it. A neuron may
have thousands of synapses connecting it with thousands of other neurons. The
resulting eect of all excitations and inhibitions modies the external membrane's
electrical potential dierence of about -70 millivolts (the inner surface is nega-
tive relative to the outer surface). Therefore, the permeability to sodium (Na+ )
increases, leading to the movement of positively charged sodium from the extra-
4 Articial Neural Networks: Algorithms and Hardware Implementation
cellular
uid into the cell interior, the cytoplasm, which in turn may generate an
action potential in the post synaptic neuron.
This very brief description of the neuron's physiology has inspired engineers
and scientist to develop adaptive systems with learning capabilities. In the fol-
lowing section we will describe the main computational models that have been
developed so far, as a result of such biological inspiration.
x1 excitatory connection
x2
Θ y
xn inhibitory connection
input
synaptic weight
x1 w1 output
w2
x2 1
Σ -1
y
wn θ
threshold
xn
X
n X
n
y(x) = 1; if w0 + wi xi = wi xi > 0; (11.3)
i=1 i=0
y = mx
Linear function
y y
1
2
1 y = exp[- |x-c|2 ]
y= s
0.5 1+ exp(-(bx-c))
c x x
c
x1
1
2 y
1
x2
AND
1
-1
1 0
OR NOT
res and if it lies on the other side it does not re. A decision surface linearly
separates the training inputs in the plane. In perceptrons with many inputs the
decision surface is a hyper-plane.
Figure 11.7 clearly shows how a linear separation in the two dimensional space
can implement the logic functions AND and OR.
X2 X2
1 1
X1 X1
0 1 0 1
AND OR
x1 x2 y
0 0 0
:0
0 1 1 y { :1
1 0 1
1 1 0
XOR truth table
X2 X2
X1 X1
a) b)
Figure 11.8: XOR implementation with (a) a nonlinear separation of the input space,
and (b) two linear separations.
11.4 Computational capabilities 9
Hidden layer
1
x1 0.5
1
-1
0.5 y
-1 1
1
x2 0.5
11.5 Learning
Learning is often associated with increase of knowledge or understanding, and
it can be achieved by studying, instruction or experience. Although it is very
dicult to precisely dene learning, in the articial neural network context it is
closely related to proper adjustment of interconnection weights. We argue that
learning is also closely related to topology adaptation (see section 11.6), since
the behavior of an articial neural model is also very dependent on the network's
topology, i.e., the number of layers, the number of neurons, and the pattern of
connections.
A learning algorithm refers to a procedure in which learning rules are used for
adjusting the weights of an articial neural network, and possibly its topology.
wij (t)
i j yj (t)
x i(t)
where xi (t) and yj (t) are the output values of neurons i and j at time t, wij (t) is
the current interconnection weight between neuron i and j , and is a parameter
called the learning rate. wij (t +1) is the future value of the synaptic weight being
updated during learning. One important characteristic of such a learning rule is
that it is done locally.
So far, Hebb's rule is the only biologically realistic learning rule, and it has
inspired a large number of learning algorithms which can be classied into three
main learning paradigms: supervised, unsupervised, and reinforcement learning.
12 Articial Neural Networks: Algorithms and Hardware Implementation
x1 y1
x2 y2
x3 y3
Figure 11.13: Competitive learning network. The solid lines indicate excitatory con-
nections whereas the dashed lines indicate inhibitory connections.
that is, the unit with normalized weight closest to the input vector, becomes the
winner.
In fact, the winning neuron can be found by simple search of the maxi-
mum/minimum activation. This neuron updates its weight while the weights
of the other neurons remain unchanged. A simple competitive weight updating
rule is the following:
(
wij = (xj , wi j ) if i = i ; (11.10)
0 if i 6= i ;
where is a constant.
Self-Organizing Maps. The self-organizing map (SOM) is a special type
of competitive learning network where the neurons have a spatial arrangement,
i.e., the neurons are typically organized in a line or a plane (see gure 11.14).
A self-organizing map has the property of topology preservation, that is, nearby
input patterns should activate nearby output neurons on the map. A network
that performs such a mapping is called a feature map. It has been found that
such topological maps are present in the cortex of highly developed animal brains:
there is a two-dimensional retinotopic map from the retina to the visual cortex, a
one-dimensional tonotopic map from the ear to the auditory cortex, a somatosen-
sory map from the skin to the somatosensory cortex, etc [15]. Such topographic
maps seem to be created during the individual development in a sort of unsuper-
vised way.
An early model of self-organization was developed by Willshaw and von der
Malsburg in the mid 1970s [41]. They used excitatory lateral connections with
neighboring neurons, and inhibitory connections with distant neurons, instead of
using a strict winner-take-all mechanism, as shown in gure 11.15. The function
that denes the form of such lateral connection weights is known as the Mexican
hat.
Teuvo Kohonen [22] described a self-organizing map learning algorithm that
approximates the eect of a Mexican hat form of lateral connection weights by
16 Articial Neural Networks: Algorithms and Hardware Implementation
x1
x2
x3
Wij excitatory
connections
inhibitory
+ + connections
- - i-j
w = w +
(x , w),
where, w is the weight vector, is the scalar product x:w, and
is a learning
constant. Other PCA learning algorithms, include the Stochastic Gradient Ascent
algorithm, the Weighted Subspace algorithm, etc [30].
Adaptive Resonance Theory. The adaptive resonance theory (ART), de-
veloped by Carpenter and Grossberg [7] attempts to solve a major problem: how
can a learning system preserve what it has previously learned, while continuing
to incorporate new knowledge. This problem is known as the stability-plasticity
dilemma. In the case of simple competitive learning there is no guarantee of sta-
bility of the obtained classes; such classes may vary with the order of presentation
of the input patterns, and the category of a given input pattern may continue
to change endlessly. One way to avoid this problem is to gradually reduce the
learning rate, but this reduces the network plasticity.
The distinctive characteristic of the ART model is the use of a vigilance pa-
rameter, which is used to determine if an input is suciently similar to those
already learned, otherwise activating a new category until the memory capacity
is full. The ART architecture is designed to learn quickly and stably in real
time in response to a possibly non-stationary world with an unlimited number
of neurons until it utilizes the full memory capacity [7]. However, the quality of
categorization attained by the algorithm is critically dependent on the a-priori
static vigilance parameter.
Applications. Unsupervised neural networks have been widely used in clus-
tering tasks, feature extraction, data dimensionality reduction, data mining (data
organization for exploring and search), information extraction, density approxi-
18 Articial Neural Networks: Algorithms and Hardware Implementation
Test error
Training error
TRAINING TIME
W1
X1
W2
X2 Threshold y
+ device
Wn
Xn T
same instruction in parallel but on dierent data. The latter, executes a step of
a calculation before passing the results to the next processor in a pipelined man-
ner. The rst systolic architecture for neural networks was the WARP machine
designed at Princeton. Figure 11.20 presents, as an example, the architecture
and the processing element of a systolic array system called MANTRA [18].
v v v v
I1 I2 I3 I4 NIN WGTIN NOUT
PE PE PE PE WOUT L EIN
1,1 1,2 1,3 1,4
h h LIN
I1 O1 LOUT
PE PE PE PE W0 W1
I h2 O h2
2
PS := PS + (D - W)
PS if PS ≤ D
PS :={
N max if PS > D
if PS ≥ D
PE PE PE PE PS :={ PS
N min if PS < D
3,1 3,2 3,3 3,4 D
W := W + PS . D
U
W := W + PS . (D - W)
h h
I3 O3
PE PE PE PE WIN
PS
EOUT
v v v v
O1 O2 O3 O4 SOUT WGTOUT SIN
Figure 11.20: Systolic architecture of the MANTRA machine and its processing element
[18]. Each processing element PE receives inputs in both horizontal and vertical directions
I , and executes a step of a calculation before passing the results to the next processor,
i
in a pipelined manner.
On the other hand, hardware accelerators for general-purpose machines seem
to be the only feasible way to provide competitive hardware systems for articial
neural networks in general, in terms of
exibility, price and performance [18].
of the error stages). When the stage's computations are completed, the FPGAs
are recongured with the next stage. This process is repeated until the task is
completed. Since only part of the algorithm is implemented at each stage, less
hardware is need to implement it and the resulting hardware implementation is
cheaper or faster, since more hardware is available to improve performance of the
active stage [8].
2. One of the most space-consuming features in digital neural network imple-
mentations is the multiplication unit. Therefore, practical multiplications try to
reduce the number of multipliers and/or reduce the complexity of the multiplier.
One typical approach is to use time-division multiplexing (TDM) and a single
shared multiplier per neuron. Another approach is to use bit-serial stochastic
computing techniques [6]. Stochastic logic is a digital circuit which realizes
pseudo-analog operations (multiplications, additions, etc) using stochastically
coded pulse sequences. Typically, such technique use probabilistic bit-streams,
where the numeric value is proportional to the density of '1's in the bit-stream. A
multiplication of two independent stochastic pulses can be implemented by a sin-
gle two-input logic gate. Though additions and subtractions are more complex,
they may be implemented by special up/down counters. A sigmoid activation
function can be obtained with a xed threshold value.
In parallel stochastic neural networks, the technique used to generate noise
for the neurons can be crucial, because the noise sources must be uncorrelated,
though several techniques for generating pseudorandom bit-streams have already
been developed [17, 3]. In Ref.[6] a single layer of 12 inputs and 10 outputs of a
stochastic neural network was implemented using Xilinx 4003 FPGAs. Ref.[24]
presents an FPGA prototyping implementation of an on-chip backpropagation
algorithm using parallel stochastic bit-streams.
3. Digital neural networks with adaptable topologies have also been imple-
mented. The Flexible Adaptable-Size Topology neural network dubbed, FAST
[32], is an unsupervised learning neural network that dynamically changes its
size. It has been implemented using commercial FPGAs, and has been success-
fully used in pattern clustering tasks, and non-trivial control tasks. However,
this network still does not exploit FPGA partial recongurability, already pos-
sible with certain FPGA architectures. This feature should lead us to more
complex, adaptable-topology neural networks, and new hardware implementa-
tion approaches.
11.8 Acknowledgments
The author gratefully acknowledges the support of the Centre Suisse d'Electro-
nique et de Microtechnique CSEM, and the support of the Swiss Federal Institute
of Technology-Lausanne. He thanks each and every one of the members of the
Logic Systems Laboratory, especially E. Sanchez, D. Mange, M. Sipper, and M.
Tomassini. Special appreciation is also expressed to M. Sipper, J.-L. Beuchat,
and C.A. Pe~na for their careful reading of this manuscript.
24 Articial Neural Networks: Algorithms and Hardware Implementation
Bibliography
[1] I. Aleksander and H. Morton. An Introduction to Neural Computing. Int.
Thomson Computer Press, 1995.
[2] A.E. Alpaydin. Neural Models of Incremental Supervised and Unsupervised
Learning. PhD thesis, Swiss Federal Institute of Technology, Lausanne, 1990.
These 863.
[3] J. Alspector, J. Gannett, S. Haber, M. Parker, and R. Chu. A vlsi-ecient
technique for generating multiple uncorrelated noise sources and its appli-
cation to stochastic neural networks. IEEE-trans on Circuits and Systems,
38(1):109{123, January 1991.
[4] M.A. Arbib. Part I: Background. In Michael A. Arbib, editor, Handbook of
Brain Theory and Neural Networks, page 11. MIT Press, 1995.
[5] T. Ash and G. Cottrell. Topology-modifying neural network algorithms. In
Michael A. Arbib, editor, Handbook of Brain Theory and Neural Networks,
pages 990{993. MIT Press, 1995.
[6] S.L. Bade and B.L. Hutchings. FPGA-based stochastic neural networks
implementation. In IEEE Workshop on FPGAs for Custom Computing Ma-
chines, Napa, April 1994.
[7] G. Carpenter and S. Grossberg. The ART of Adaptive Pattern Recognition
by a self-organizing neural network. IEEE Computer, pages 77{88, March
1988.
[8] J.G. Eldredge and B.L. Hutchings. Density enhancement of a neural network
using FPGAs and run-time reconguration. In IEEE Workshop on FPGAs
for Custom Computing Machines, Napa, April 1994.
[9] S.E. Fahlman and C. Lebiere. The Cascade-Correlation learning architecture.
In NIPS2, pages 524{532, 1990.
[10] E. Fiesler. Comparative bibliography of ontogenic neural networks. Pro-
ceedings of the International Conference on Articial Neural Networks
(ICANN'94), 1994.
26 BIBLIOGRAPHY
[11] G. Fischback. Mind and Brain. Scientic American, pages 24{33, September
1992.
[12] F. Fogelman-Soilie. Applications of neural networks. In Michael A. Arbib,
editor, Handbook of Brain Theory and Neural Networks, pages 94{98. MIT
Press, 1995.
[13] M. Frean. The Upstart algorithm: A method for constructing and training
feed-forward neural networks. Neural Computation, 2:198{209, 1990.
[14] B. Fritzke. Unsupervised ontogenic networks. In Handbook of Neural Com-
putation, pages C2.4:1{C2.4:16. Institute of Physics Publishing and Oxford
University Press, 1997.
[15] K. Hertz, A. Krogh, and R. Palmer. Introduction to the Theory of Neural
Computation, chapter 9. Addison-Wesley, Redwood City,CA, 1991.
[16] J.J. Hopeld. Neural network and physical systems with collective compu-
tational abilities. Proceedings of the National Academy of Science, 78(8),
1982.
[17] P.D. Hortensius, R.D. McLeod, and H.C. Card. Parallel random number
generation for VLSI systems using cellular automata. IEEE Transactions
on Computers, 38(10):1466{1473, October 1989.
[18] P. Ienne. Programmable VLSI Systolic Processors for Neural Network and
Matrix Computations. PhD thesis, Swiss Federal Institute of Technology,
Lausanne, 1996. These 1525.
[19] P. Ienne, T. Cornu, and G. Kuhn. Special-purpose digital hardware for neural
networks: An architectural survey. Journal of VLSI Signal Processing, pages
13(1):5{25, 1996.
[20] A. Jain, J. Mao, and K. Mohiuddin. Articial neural networks: A tutorial.
IEEE Computer, pages 31{44, March 1996.
[21] T. Kohonen. Radial basis function networks. In Michael A. Arbib, editor,
Handbook of Brain Theory and Neural Networks, pages 537{540. MIT Press,
1995.
[22] T. Kohonen. Self-Organizing Maps, volume 30. Springer Series in Informa-
tion Sciences, april 1995.
[23] J.F. Kolen. Exploring the Computational Capabilities of Recurrent Neural
Networks. PhD thesis, Ohio State University, 1994.
[24] K. Kollmann, K. Riemschneider, and H.C. Zeidler. On-chip Backpropagation
training using parallel stochastic bit streams. In Proceedings of the IEEE
International Conference on Microelectronics for Neural Networks and Fuzzy
Systems Microneuro'96, pages 149{156, 1996.
BIBLIOGRAPHY 27
[40] E. Vittoz. Analog VLSI signal processing: Why, where and how? Journal
of VLSI Signal Processing, 8:27{44, July 1994.
[41] C. von der Malsburg. Self-organization of orientation sensitive cells in the
striate cortex. Kybernetik, 14:85{100, 1973.
[42] P.J. Werbos. The Roots of Backpropagation: From ordered derivatives to
Neural Neworks and Political Forecasting. John Wiley and Sons, New York,
1994.
[43] B. Widrow and M. Lehr. Perceptrons, Adalines, and Backpropagation. In
Michael A. Arbib, editor, Handbook of Brain Theory and Neural Networks,
pages 719{724. MIT Press, 1995.
[44] X. Yao. Evolutionary articial neural networks. International Journal of
Neural Systems, 4(3):203{222, 1993.
BIBLIOGRAPHY 29
Index
ADALINE, 12 Hebbian, 11
adaptive resonance theory, 17 incremental, 20
Aleksander, Igor, 21 Kohonen, 15
ANN, 1 perceptron, 12
ART, 17 reinforcement, 18
articial neural networks, 1 supervised, 12
associative memory, 10 temporal dierence, 18
trial and error, 18
backpropagation algorithm, 13 unsupervised, 14
bias-variance dilemma, 20 learning games, 18
biological inspiration, 2 learning vector quantization, 13
clustering, 14 least mean square, 12
competitive learning, 14 LMS, 12
LVQ, 13
data compression, 18
data mining, 17 Malsburg, von der, 15
delayed reward, 18 MANTRA machine, 22
dimensionality reduction, 17 McCulluch-Pitts, 4
mexican hat, 15
exploitation-exploration trade-o, 18 Minsky, Marvin, 5
MLP, 9
FAST neural network, 23 modular neural networks, 20
feature extraction, 13, 16 multilayer perceptrons, 9
feature maps, 15
FPGA neural networks, 1
neural network implementation, 22 analog implementation, 21
run-time reconguration, 22 applications, 13, 17, 18
function approximation, 13 digital implementation, 21
FPGA implementation, 22
generalization, 19 hardware implementation, 20
growing neural networks, 20 modular, 20
Hebb's rule, 11 ontogenic, 20
Hebb, Donald O., 11 recurrent, 10
Hopeld networks, 10 stochastic, 23
topology adaptation, 19
Kohonen learning algorithm, 15 weightless, 21
neurocomputers, 21
learning, 11 neuron
algorithms, 11 biological, 2
competitive, 14 McCulluch-Pitts, 4
constructive, 20 neurotransmitters, 3
INDEX 31
Oja's algorithm, 16
ontogenic neural networks, 20
overtting, 19
PCA networks, 16
perceptron, 4
perceptron learning algorithm, 12
principal component analysis, 16
pruning, 20
radial basis functions, 13
Ramon y Cajal, Santiago, 2
RBF, 13
recurrent neural networks, 10
reinforcement learning, 18
Rosenblatt, Frank, 4
self-organizing maps, 15
SIMD architecture, 21
SOM, 15
stability-plasticity dilemma, 17
stochastic logic, 23
supervised learning, 12
synapse, 3
systolic array, 21
TD(0) algorithm, 18
temporal dierence learning, 18
threshold logic, 7
trial and error learning, 18
universal function approximation, 6
unsupervised learning, 14
weightless neural networks, 21
Werbos, Paul J., 13
Widrow-Ho rule, 12
winner take all, 14
XOR problem, 9