Академический Документы
Профессиональный Документы
Культура Документы
The figure on next page presents the architecture of backpropagation networks. There
may be any number of hidden layers, and any number of hidden units in any given
hidden layer. Input and output units can be binary {0, 1}, bi-polar {-1, +1}, or may
have real values within a specific range such as [-1, 1]. Note that units within the same
layer are not interconnected.
Backpropagation Networks
Backpropagation Networks
In feedforward activation, units of hidden layer
1 compute their activation and output values and
pass these on to the next layer, and so on until
the output units will have produced the
network's actual response to the current input.
The activation value ak of unit k is computed as
follows.
This is basically the same activation function of
linear threshold units (McCulloch and Pitts
model).
As illustrated above, xi is the input signal
coming from unit i at the other end of the
incoming connection. wki is the weight of the
connection between unit k and unit i. Unlike in
the linear threshold unit, the output of a unit in a
backpropagation network is no longer based on
a threshold. The output yk of unit k is computed
as follows:
The function f(x) is referred to as the output
function. It is a continuously increasing function
of the sigmoid type, asymptotically approaching
0 as x decreases, and asymptotically approaches
1 as x increases. At x = 0, f(x) is equal to 0.5.
Backpropagation Networks
In some implementations of the
backpropagation model, it is convenient to
have input and output values that are bi-polar.
In this case, the output function uses the
hypertangent function, which has basically
the same shape, but would be asymptotic to
1 as x decreases. This function has value 0
when x is 0.
The errors at the other end of the outgoing connections of the hidden unit h have been earlier
computed. These could be error values at the output layer or at a hidden layer. These error signals are
multiplied by their corresponding outgoing connection weights and the sum of these is taken.
Backpropagation Networks
The errors at the other end of the outgoing connections of the hidden unit h
have been earlier computed. These could be error values at the output layer or
at a hidden layer. These error signals are multiplied by their corresponding
outgoing connection weights and the sum of these is taken.
Backpropagation Networks
After computing for the error for each unit, whether
it be at a hidden unit or at an output unit, the
network then fine-tunes its connection weights
wkjt+1. The weight update rule is uniform for all
connection weights.
The learning rate a is typically a small value
between 0 and 1. It controls the size of weight
adjustments and has some bearing on the speed of
the learning process as well as on the precision by
which the network can possibly operate. f(x) also
controls the size of weight adjustments, depending
on the actual output f(x). In the case of the sigmoid
function above, its first derivative (slope) f(x) is
easily computed as follows:
We note that the change in weight is directly proportional to the error term computed for the unit at the
output end of the incoming connection. However, this weight change is controlled by the output signal
coming from the input end of the incoming connection. We can infer that very little weight change
(learning) occurs when this input signal is almost zero.
The weight change is further controlled by the term f(ak). Because this term measures the slope of the
function, and knowing the shape of the function, we can infer that there will likewise be little weight
change when the output of the unit at the other end of the connection is close to 0 or 1. Thus, learning
will take place mainly at those connections with high pre-synaptic signals and non-committed (hovering
around 0.5) post-synaptic signals.
Learning Process
One of the most important aspects of Neural Network is the learning process. The
learning process of a Neural Network can be viewed as reshaping a sheet of metal,
which represents the output (range) of the function being mapped. The training set
(domain) acts as energy required to bend the sheet of metal such that it passes through
predefined points. However, the metal, by its nature, will resist such reshaping. So the
network will attempt to find a low energy configuration (i.e. a flat/non-wrinkled shape)
that satisfies the constraints (training data).
In supervised training, both the inputs and the outputs are provided.
The network then processes the inputs and compares its resulting outputs against the
desired outputs. Errors are then calculated, causing the system to adjust the weights
which control the network. This process occurs over and over as the weights are
continually tweaked.
Backpropagation Learning Math
See next
slide for
explanation
Visualization of
Backpropagation
learning
References:
http://www.ece.utep.edu/research/webfuzzy/docs/kk-thesis/kk-thesis-html/node12.html
http://www.comp.nus.edu.sg/~pris/ArtificialNeuralNetworks/LinearThresholdUnit.html
http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/Lect1-UWA.pdf
Supervised and Unsupervised
Neural Networks
References
http://www.ai.rug.nl/vakinformatie/ias/slide
s/3_NeuralNetworksAdaptation.pdf
http://www.users.cs.york.ac.uk/~sok/IML/iml
_nn_arch.pdf
http://ilab.usc.edu/classes/2005cs561/note
s/LearningInNeuralNetworks-CS561-3-05.pdf
Understanding Supervised and
Unsupervised Learning
B
A B
A
B A
Two possible Solutions
A B A B B B
A A
A B B A
Supervised Learning
It is based on a
labeled training set. Class
Tasks:
Clustering - Group patterns based on similarity
Vector Quantization - Fully divide up S into a small
set of regions (defined by codebook vectors) that
also helps cluster P.
Feature Extraction - Reduce dimensionality of S by
removing unimportant features (i.e. those that do not
help in clustering P)
Unsupervised Learning
Basic ISOdata Algorithm
Choose some initial values for the means 1 c
Classify the m samples by assigning them to the class of
the closest mean.
Re-compute the means as the average of the samples in
the class
Repeat until no mean changes value
Unsupervised Learning
Similarity Measures
1. normalized inner product
The objective of the similarity measure approach is to try
to find natural groupings. We will now assume that x is an
n-dimensional column vector.
3. Scattering Criteria
4. Iterative Optimization
Select a criterion function.
Find sets that extremize criterion function (solve by exhaustive
enumeration)
Hebbian learning
Hebbs Law states that if neuron i is near enough to excite
neuron j and repeatedly participates in its activation, the
synaptic connection between these two neurons is
strengthened and neuron j becomes more sensitive to
stimuli from neuron i.
Hebbs Law can be represented in the form of two rules:
1. If two neurons on either side of a connection are activated
synchronously, then the weight of that connection is increased.
i
j
Hebbian learning in a neural network
O u t p u t Si g n a l s
Hebbian learning in a neural network
1 0 0 1
(a) (b)
The Kohonen network
The Kohonen model provides a topological mapping. It
places a fixed number of input patterns from the input
layer into a higher-dimensional output or Kohonen layer.
Training in the Kohonen network begins with the winners
neighbourhood of a fairly large size. Then, as training
proceeds, the neighbourhood size gradually decreases.
The lateral connections are used to create a competition
between neurons. The neuron with the largest activation
level among all neurons in the output layer becomes the
winner. This neuron is the only neuron that produces an
output signal. The activity of all other neurons is
suppressed in the competition.
Architecture of the Kohonen
Network
y1
O u t p u t Si g n a l s
I n p u t Si g n a l s
x1
y2
x2
y3
Input Output
layer layer
The Kohonen network
0 Distance
Inhibitory Inhibitory
effect effect
The Kohonen network
In the Kohonen network, a neuron learns by shifting its
weights from inactive connections to active ones. Only
the winning neuron and its neighbourhood are allowed to
learn. If a neuron does not respond to a given input
pattern, then learning cannot occur in that particular
neuron.
The competitive learning rule defines the change wij
applied to synaptic weight wij as
Step 3: Learning.
Update the synaptic weights
wij ( p 1) wij ( p ) wij ( p )
where wij(p) is the weight correction at iteration p.
The weight correction is determined by the competitive
learning rule:
[ xi wij ( p )] , j j ( p)
wij ( p)
0, j j ( p)
where is the learning rate parameter, and j(p) is the
neighbourhood function centred around the winner-takes-
all neuron jX at iteration p.
The Kohonen network
Step 4: Iteration.
Increase iteration p by one, go back to Step 2 and
continue until the minimum-distance Euclidean criterion is
satisfied, or no noticeable changes occur in the feature
map.
Adaptive Resonance Theory (ART)
Stability: system behaviour doesnt change
after irrelevant events
Plasticity: System adapts its behaviour
according to significant events
Dilemma: how to achieve stability without
rigidity and plasticity without chaos?
Ongoing learning capability
Preservation of learned knowledge
ART Architecture
Bottom-up weights bij
Top-down weights tij
Store class template
Input nodes
Vigilance test
Input normalisation
Output nodes
Forward matching
Long-term memory
ANN weights
Short-term memory
ANN activation pattern top down
bottom up (normalised)
ART Algorithm
new pattern
recognition
comparison
categorisation
known unknown
Incoming pattern matched with
stored cluster templates
Adapt winner Initialise If close enough to stored
node uncommitted node template joins best matching
cluster, weights adapted
If not, a new cluster is
initialised with pattern as
template
ART1 Architecture
50
Additional Modules
Categorisation result
Output layer
Gain control
Input pattern
51
Reset Module
Fixed connection weights
Implements the vigilance test
Excitatory connection from F1(b)
Inhibitory connection from F1(a)
Output of reset module inhibitory to
output layer
Disables firing output node if match
with pattern is not close enough
Duration of reset signal lasts until
pattern is present
52
Gain module
Fixed connection weights
Controls activation cycle of input
layer
Excitatory connection from input
lines
Inhibitory connection from output
layer
Output of gain module excitatory to
input layer
2/3 rule for input layer
53
Recognition Phase
Forward transmission via bottom-up weights
Input pattern matched with bottom-up
weights (normalised template) of output
nodes
Inner product xbi
Best matching node fires (winner-take-all
layer)
Similar to Kohonens SOM algorithm, pattern
associated to closest matching template
ART1: fraction of bits of template also in
input pattern
54
Comparison Phase
Backward transmission via top-down weights
Vigilance test: class template matched with input
pattern
If pattern close enough to template,
categorisation was successful and resonance
achieved
If not close enough reset winner neuron and try
next best matching
Repeat until
vigilance test passed
55
ART1 Algorithm
Step 0 : initialize
parameters :
L 1
0 1
initialize weights : L
0 bij (0)
L 1 n
tji (0) 1
Step 1: While stopping condition is false do Steps 2-13
56
ART1 Algorithm (cont.)
Step 4: Compute the norm of s:
s si
i
Step 5: Send input signal from F1(a) to the F1(b) layer
xi si
Step 6: For each F2 node that is not inhibited:
if y-1 then
yj bi
ij xi
57
ART1 Algorithm (cont.)
Step 8: find J such that yJyj for all nodes j.
If yJ then all nodes are inhibited and this pattern cannot be clustered.
x xi
i
Lxi
bij ( new)
L 1 x
tJi ( new) xi
Step 13: Test for stopping condition.
60