Академический Документы
Профессиональный Документы
Культура Документы
•Topologies
•Adaptation methods
•Recall dynamics
•The Renaissance
•William James
•Donald Hebb
•Frank Rosenblatt
•Perceptrons
•Stephen Grossberg
•Shun-Ichi Amari
•Teuvo Kohonen
•Kunihiko Fukishima
Perceptrons
•Carver Mead at Cal Tech followed with VLSI nets for artificial
retinas and cochleas
Simplified 4-PE Hopfield Network
The sum over the two patterns of ViVj for each weight is always 1 + 1 = 2
Net input to PE i WijV j I i
i j
Simplified 4-PE Hopfield Network, Cont’d.
(1,1,1,1) and (-1,-1,-1,-1) are stored patterns.
•As of 1997, IEEE Transactions in all three fields: neural networks, fuzzy
systems, and evolutionary computation
•NN topologies
•NN learning
•NN recall
•NN taxonomy
•In the simplest terms, NNs map input vectors to output vectors
•Terminology
•Weights
•Processing elements
•PE functions
Network Used to Illustrate Terminology
NN Components and Terminology
IEEE Computational Intelligence Society Standards Committee
•Linear combination
•Min-max connections
•...others
PEs with Internal and External Biases
Input Computation Types
y j f xi w ji f X W j
n
i 0
2 X Wj
2 2 2
X Wj Wj X
Mean Variance
n w x
2
x2
y j g g x exp
ji i
i 1 v ji
2
PE Activation Functions
•Also sometimes referred to as threshold functions or
squashing functions
f ( x) x
Step PE Function
if x
f ( x)
if x
1 if x 0 Binary step
f ( x) function
0 otherwise
Bipolar step function replaces 0 with -1.
Ramp PE Function
if x
f ( x) x if x
if x
Sigmoid PE Function
1
f ( x) x
1 e
As alpha approaches infinity, f(x) approaches a step function.
Gaussian PE Function
x2
f ( x) exp( )
v
Neural Network Topologies
•Output layer Fz
•Applications:
* Pattern classification
* Pattern matching
* Function approximation
* ...any nonlinear mapping is possible with
nonlinear PEs
•Neocognitron
•Boltzmann machine
•Cauchy machine
•Supervised/unsupervised
•Offline/online
Supervised Adaptation
•Issues:
* How long to train
* How to measure results
Unsupervised Adaptation
•Examples:
* Self-organizing feature maps
* Competitive adaptation
Offline Adaptation
w new ji xi y j
old
ji w
(For each pattern; η is learning rate)
new
w ji old
w ji kj a ki
where
kj bkj ykj
Note: the value for delta is for one pattern presented to one PE
Competitive Adaptation
•Automatically creates classes for a set of input patterns
•Two-layer network
•Two-step procedure
Researchers include:
Grossberg
Von der Malsburg
Amari
Takeuchi
Competitive Adaptation Process
1.Determine winning Fy PE using dot product (winner has
largest value).
(Input patterns and weight vectors are often normalized.
Usually, “winner takes all.”)
n
0 y j Ak W j a ki w ji 1
i 1
new
w ji old
w ji
(t ) y j aki w ji
old
•In our example, we use linear output PEs and nonlinear hidden
PEs with sigmoid activations:
output = 1/(1+exp(-input))
2
m q
E 0.5 bk j z k j Summed over PEs and patterns
k 1 j 1
p p
z k j yk i w j i f l rk j , where rk j yk i w j i
i 1 i 1
inputs
Multilayer Error Correction Adaptation,
Cont’d.
Move along error (cost) gradient to a minimum
Output weights Fy to Fz are adjusted like the 2-layer case, since
output PEs have linear activation functions:
Ek j Ek j z k j
w ji zk j w ji
Ek j
w ji
zk j
1
2
2 bk j z k j w
p
w j i y k i
ji i 1
(bk j z k j ) yk i k j yk i
Multilayer Error Correction Adaptation,
Cont’d.
Next, adjust weights V between input and hidden layers
Define error k i Ek / rki where rki is the net input to the hidden PE
(consistent with the 2-layer version)
Ek Ek rk i
k i ak h
vi h rk i vi h
The key question is how to compute the k i values for hidden PEs
But y ki / rki f n rk i is the derivative of the sigmoid activation function.
Multilayer Error Correction Adaptation,
Cont’d.
Now use the chain rule twice:
Ek Ek y k i Ek
f n rk i
rk i yk i rk i yk i
ki
Ek Ek rk j Ek
but yk i w j i k j w j i
yk i j rk j y k i j rk j y k i i j
so therefore k i f i(rk i ) k j w j i
j
Multilayer Error Correction
Adaptation, Cont’d.
But
f n rk i y ki / rki y k i 1 y k i
So the error assigned to a hidden PE is
k i yk i 1 yk i k j w j i
j
Hidden PE Output PE
Multilayer Error Correction
Adaptation, Cont’d.
As is the case for two-layer error correction adaptation,
E j / w j i k (Ek j / w ji ) and E j k Ek j
E
h vi h
vinew h k i ak h
viold
old
vi h k
where = =
Back-propagation Unresolved
Issues
•Number of training parameters
•Preparing data
•Test sets - Similar to training patterns, but not used for training; they
determine how well the NN should perform in a production environment;
should generally reflect expected probability distribution
'
Ak i Ak min Hi Lo
Lo
Ak i
Akmax Ak min
Normalizing Data
Ak i
Aki'
ki
A 2
Note: Not good if most values are near 0, s near 1 is then the dominant
parameter in the input
Z-axis Normalization Process
L n
2 1/ 2
i 1 Ak i
Z-axis Normalization Example
Consider two 4-dimensional input patterns:
-5,5,-5,5 and -2,2,-2,2 (n=4)
Scale them: -1,1,-1,1 and -.4,.4,-.4, .4
(L=2) (L=0.8)
(Note: They would normalize to identical vectors.)
.64
Second pattern now -.2,.2,-.2,.2,.9165 s 1 .84
4
Pattern Presentation,
and Adding Noise
Cki'
Cki LoCk max Ck min
Ck min
Hi Lo
where Hi and Lo are max and min PE activation values