You are on page 1of 225

Introduction to Neural Networks

Chri-tian Lorct
lntcicnt Lata Anay-i- and Grajhica `odc- lc-carch nit
Lurojcan Ccntcr tor Sott Comjutin
c, Gonzao Guticrrcz Quiro- -,n, 33o00 `icrc-, Sjain
christian.borgelt@softcomputing.es
http://www.borgelt.net/
Christian Borgelt Introduction to Neural Networks 1
Contents
Introduction
Motivation, Biological Background
Threshold Logic Units
Denition, Geometric Interpretation, Limitations, Networks of TLUs, Training
General Neural Networks
Structure, Operation, Training
Multilayer Perceptrons
Denition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis
Radial Basis Function Networks
Denition, Function Approximation, Initialization, Training, Generalized Version
Self-Organizing Maps
Denition, Learning Vector Quantization, Neighborhood of Output Neurons
Hopeld Networks
Denition, Convergence, Associative Memory, Solving Optimization Problems
Recurrent Neural Networks
Dierential Equations, Vector Networks, Backpropagation through Time
Christian Borgelt Introduction to Neural Networks 2
Motivation: Why (Articial) Neural Networks?
(Neuro-)Biology / (Neuro-)Physiology / Psychology:
Lxjoit -imiarity to rca (iooica) ncura nctwork-
Luid modc- to undcr-tand ncrvc and rain ojcration y -imuation
Computer Science / Engineering / Economics
`imic ccrtain conitivc cajaiitic- ot human cin-
Sovc carnin,adajtation, jrcdiction, and ojtimization jrocm-
Physics / Chemistry
-c ncura nctwork modc- to dc-cric jhy-ica jhcnomcna
Sjccia ca-c -jin a--c- (aoy- ot manctic and non-manctic mcta-)
Christian Borgelt Introduction to Neural Networks 3
Motivation: Why Neural Networks in AI?
Physical-Symbol System Hypothesis |`cwc and Simon 19o|
A jhy-ica--ymo -y-tcm ha- thc nccc--ary and -ucicnt mcan-
tor cncra intcicnt action
Neural networks process simple signals, not symbols.
So why -tudy ncura nctwork- in Articia lntcicncc
Symo-a-cd rcjrc-cntation- work wc tor intcrcncc ta-k-,
ut arc tairy ad tor jcrccjtion ta-k-
Symo-a-cd cxjcrt -y-tcm- tcnd to ct -owcr with rowin knowcdc,
human cxjcrt- tcnd to ct ta-tcr
`cura nctwork- aow tor hihy jarac intormation jrocc--in
Thcrc arc -cvcra -uccc--tu ajjication- in indu-try and nancc
Christian Borgelt Introduction to Neural Networks 4
Biological Background
Structure of a prototypical biological neuron
cc corc
axon
mycin -hcath
cc ody
(-oma)
tcrmina utton
-ynaj-i-
dcndritc-
Christian Borgelt Introduction to Neural Networks 5
Biological Background
(Very) simplied description of neural information processing
Axon tcrmina rcca-c- chcmica-, cacd neurotransmitters
Thc-c act on thc mcmranc ot thc rcccjtor dcndritc to chanc it- joarization
(Thc in-idc i- u-uay 0m\ morc ncativc than thc out-idc)
Lccrca-c in jotcntia dicrcncc excitatory -ynaj-c
lncrca-c in jotcntia dicrcncc inhibitory -ynaj-c
lt thcrc i- cnouh nct cxcitatory injut, thc axon i- dcjoarizcd
Thc rc-utin action potential travc- aon thc axon
(Sjccd dcjcnd- on thc dcrcc to which thc axon i- covcrcd with mycin)
\hcn thc action jotcntia rcachc- thc tcrmina utton-,
it tricr- thc rcca-c ot ncurotran-mittcr-
Christian Borgelt Introduction to Neural Networks 6
Threshold Logic Units
Christian Borgelt Introduction to Neural Networks 7
Threshold Logic Units
A Threshold Logic Unit (TLU) i- a jrocc--in unit tor numcr- with n injut-
x
1
, . . . , x
n
and onc outjut y Thc unit ha- a threshold and cach injut x
i
i-
a--ociatcd with a weight w
i
A thrc-hod oic unit comjutc- thc tunction
y
_

_
1, it x w
n

i1
w
i
x
i
,
0, othcrwi-c

x
1

x
n
w
1

w
n
y
Christian Borgelt Introduction to Neural Networks 8
Threshold Logic Units: Examples
Threshold logic unit for the conjunction x
1
x
2
.
!
x
1
3
x
2
2
y
x
1
x
2
3x
1
+ 2x
2
y
0 0 0 0
1 0 3 0
0 1 2 0
1 1 ` 1
Threshold logic unit for the implication x
2
x
1
.
1
x
1
2
x
2
2
y
x
1
x
2
2x
1
2x
2
y
0 0 0 1
1 0 2 1
0 1 2 0
1 1 0 1
Christian Borgelt Introduction to Neural Networks 9
Threshold Logic Units: Examples
Threshold logic unit for (x
1
x
2
) (x
1
x
3
) (x
2
x
3
).
1
x
1
2
x
2
2
x
3
2
y
x
1
x
2
x
3

i
w
i
x
i
y
0 0 0 0 0
1 0 0 2 1
0 1 0 2 0
1 1 0 0 0
0 0 1 2 1
1 0 1 ! 1
0 1 1 0 0
1 1 1 2 1
Christian Borgelt Introduction to Neural Networks 10
Threshold Logic Units: Geometric Interpretation
Review of line representations
Straiht inc- arc u-uay rcjrc-cntcd in onc ot thc toowin torm-
Lxjicit Iorm g x
2
bx
1
+ c
lmjicit Iorm g a
1
x
1
+ a
2
x
2
+ d 0
loint-Lircction Iorm g x p + kr
`orma Iorm g (x p)n 0
with thc jaramctcr-
b Gradicnt ot thc inc
c Scction ot thc x
2
axi-
p \cctor ot a joint ot thc inc (a-c vcctor)
r Lircction vcctor ot thc inc
n `orma vcctor ot thc inc
Christian Borgelt Introduction to Neural Networks 11
Threshold Logic Units: Geometric Interpretation
A straight line and its dening parameters.
O
x
2
x
1
g
p
r
n (a
1
, a
2
)
c
q
d
[n[
n
[n[
d pn
b
r
2
r
1

Christian Borgelt Introduction to Neural Networks 12


Threshold Logic Units: Geometric Interpretation
How to determine the side on which a point x lies.
O
g
x
1
x
2
x
z
q
d
[n[
n
[n[
z
xn
[n[
n
[n[

Christian Borgelt Introduction to Neural Networks 13


Threshold Logic Units: Geometric Interpretation
Threshold logic unit for x
1
x
2
.
!
x
1
3
x
2
2
y
0 1
1
0
x
1
x
2
0
1
A threshold logic unit for x
2
x
1
.
1
x
1
2
x
2
2
y
0 1
1
0
x
1
x
2
0
1
Christian Borgelt Introduction to Neural Networks 14
Threshold Logic Units: Geometric Interpretation
\i-uaization ot 3-dimcn-iona
Loocan tunction-
x
1
x
2
x
3
(0, 0, 0)
(1, 1, 1)
Threshold logic unit for (x
1
x
2
) (x
1
x
3
) (x
2
x
3
).
1
x
1
2
x
2
2
x
3
2
y
x
1
x
2
x
3
Christian Borgelt Introduction to Neural Networks 15
Threshold Logic Units: Limitations
The biimplication problem x
1
x
2
: There is no separating line.
x
1
x
2
y
0 0 1
1 0 0
0 1 0
1 1 1
0 1
1
0
x
1
x
2
Formal proof y reductio ad absurdum
-incc (0, 0) 1 0 , (1)
-incc (1, 0) 0 w
1
< , (2)
-incc (0, 1) 0 w
2
< , (3)
-incc (1, 1) 1 w
1
+ w
2
. (!)
(2) and (3) w
1
+ w
2
< 2 \ith (!) 2 > , or > 0 Contradiction to (1)
Christian Borgelt Introduction to Neural Networks 16
Threshold Logic Units: Limitations
Total number and number of linearly separable Boolean functions.
(|\idncr 19o0| a- citcd in |Zc 199!|)
injut- Loocan tunction- incary -cjarac tunction-
1 ! !
2 1o 1!
3 2`o 10!
! o``3o 1!
` !.3 10
9
9!`2
o 1.S 10
19
`.0 10
o
Ior many injut- a thrc-hod oic unit can comjutc amo-t no tunction-
`ctwork- ot thrc-hod oic unit- arc nccdcd to ovcrcomc thc imitation-
Christian Borgelt Introduction to Neural Networks 17
Networks of Threshold Logic Units
Solving the biimplication problem with a network.
ldca oica dccomjo-ition x
1
x
2
(x
1
x
2
) (x
2
x
1
)
1
1
3
x
1
x
2
2
2
2
2
2
2
y x
1
x
2

comjutc- y
1
x
1
x
2
d
d
d
ds
comjutc- y
2
x
2
x
1

comjutc- y y
1
y
2
Christian Borgelt Introduction to Neural Networks 18
Networks of Threshold Logic Units
Solving the biimplication problem: Geometric interpretation
0 1
1
0
x
1
x
2
g
2
g
1
a
d
c
b
0
1
1
0

0 1
1
0
y
1
y
2
ac
b
d
g
3
0
1
Thc r-t aycr comjutc- ncw Loocan coordinatc- tor thc joint-
Attcr thc coordinatc tran-tormation thc jrocm i- incary -cjarac
Christian Borgelt Introduction to Neural Networks 19
Representing Arbitrary Boolean Functions
Lct y f(x
1
, . . . , x
n
) c a Loocan tunction ot n variac-
(i) lcjrc-cnt f(x
1
, . . . , x
n
) in di-,unctivc norma torm That i-, dctcrminc
D
f
K
1
. . . K
m
, whcrc a K
j
arc con,unction- ot n itcra-, ic,
K
j
l
j1
. . . l
jn
with l
ji
x
i
(jo-itivc itcra) or l
ji
x
i
(ncativc
itcra)
(ii) Crcatc a ncuron tor cach con,unction K
j
ot thc di-,unctivc norma torm (havin
n injut- onc injut tor cach variac), whcrc
w
ji

_
2, it l
ji
x
i
,
2, it l
ji
x
i
,
and
j
n 1 +
1
2
n

i1
w
ji
.
(iii) Crcatc an outjut ncuron (havin m injut- onc injut tor cach ncuron that
wa- crcatcd in -tcj (ii)), whcrc
w
(n+1)k
2, k 1, . . . , m, and
n+1
1.
Christian Borgelt Introduction to Neural Networks 20
Training Threshold Logic Units
Christian Borgelt Introduction to Neural Networks 21
Training Threshold Logic Units
Gcomctric intcrjrctation jrovidc- a way to con-truct thrc-hod oic unit-
with 2 and 3 injut-, ut
`ot an automatic mcthod (human vi-uaization nccdcd)
`ot tca-ic tor morc than 3 injut-
General idea of automatic training:
Start with random vauc- tor wciht- and thrc-hod
Lctcrminc thc crror ot thc outjut tor a -ct ot trainin jattcrn-
Lrror i- a tunction ot thc wciht- and thc thrc-hod e e(w
1
, . . . , w
n
, )
Adajt wciht- and thrc-hod -o that thc crror ct- -macr
ltcratc adajtation unti thc crror vani-hc-
Christian Borgelt Introduction to Neural Networks 22
Training Threshold Logic Units
Single input threshold logic unit for the negation x.

x
w
y
x y
0 1
1 0
Output error as a function of weight and threshold.
crror tor x 0
w

2
1
0
1
2
2
1
0
1
2
e
1
2
1
crror tor x 1
w

2
1
0
1
2
2
1
0
1
2
e
1
2
-um ot crror-
w

2
1
0
1
2
2
1
0
1
2
e
1
2
1
Christian Borgelt Introduction to Neural Networks 23
Training Threshold Logic Units
Thc crror tunction cannot c u-cd dirccty, ccau-c it con-i-t- ot jatcau-
Soution lt thc comjutcd outjut i- wron,
takc into account, how tar thc wcihtcd -um i- trom thc thrc-hod
Modied output error as a function of weight and threshold.
crror tor x 0
w

2
1
0
1
2
2
1
0
1
2
e
2
4
2
crror tor x 1
w

2
1
0
1
2
2
1
0
1
2
e
2
4
-um ot crror-
w

2
1
0
1
2
2
1
0
1
2
e
2
4
2
Christian Borgelt Introduction to Neural Networks 24
Training Threshold Logic Units
Schemata of resulting directions of parameter changes.
chanc- tor x 0

w
2 1 0 1 2
2
1
0
1
2
'
chanc- tor x 1

w
2 1 0 1 2
2
1
0
1
2

d
d
d
d
-um ot chanc-

w
2 1 0 1 2
2
1
0
1
2

'
d
d
d
d
c
Start at random joint
ltcrativcy adajt jaramctcr-
accordin to thc dircction corrc-jondin to thc currcnt joint
Christian Borgelt Introduction to Neural Networks 25
Training Threshold Logic Units
Example training procedure: Online and batch training.
Oninc-Lcrncn

w
2 1 0 1 2
2
1
0
1
2

s ' s
d
d
d

s ' s
d
d
d

s ' s ' s
d
d
d

s ' s
Latch-Lcrncn

w
2 1 0 1 2
2
1
0
1
2

s
c
s ' s
c
s ' s
d
d
d

s ' s
Latch-Lcrncn
w

2
1
0
1
2
2
1
0
1
2
e
2
4
2

1
2
x
1
y
E
x
0 1
'
Christian Borgelt Introduction to Neural Networks 26
Training Threshold Logic Units: Delta Rule
Formal Training Rule: Lct x (x
1
, . . . , x
n
) c an injut vcctor ot a thrc-hod
oic unit, o thc dc-ircd outjut tor thi- injut vcctor and y thc actua outjut ot
thc thrc-hod oic unit lt y , o, thcn thc thrc-hod and thc wciht vcctor
w (w
1
, . . . , w
n
) arc adajtcd a- toow- in ordcr to rcducc thc crror

(ncw)

(od)
+ with (o y),
i 1, . . . , n w
(ncw)
i
w
(od)
i
+ w
i
with w
i
(o y)x
i
,
whcrc i- a jaramctcr that i- cacd learning rate lt dctcrminc- thc -cvcrity
ot thc wciht chanc- Thi- jroccdurc i- cacd Delta Rule or WidrowHo
Procedure |\idrow and Lo 19o0|
Online Training: Adajt jaramctcr- attcr cach trainin jattcrn
Batch Training: Adajt jaramctcr- ony at thc cnd ot cach epoch,
ic attcr a travcr-a ot a trainin jattcrn-
Christian Borgelt Introduction to Neural Networks 27
Training Threshold Logic Units: Delta Rule
Turning the threshold value into a weight:

x
1
w
1
x
2
w
2

x
n
w
n
y
n

i1
w
i
x
i

0
1 x
0
w
0

x
1
w
1
x
2
w
2

x
n
w
n
y
n

i1
w
i
x
i
0
Christian Borgelt Introduction to Neural Networks 28
Training Threshold Logic Units: Delta Rule
procedure oninc trainin (var w, var , L, ).
var y, e. ( outjut, -um ot crror- )
begin
repeat
e 0. ( initiaizc thc crror -um )
for all (x, o) L do begin ( travcr-c thc jattcrn- )
if ( wx ) then y 1. ( comjutc thc outjut )
else y 0. ( ot thc thrc-hod oic unit )
if (y , o) then begin ( it thc outjut i- wron )
(o y). ( adajt thc thrc-hod )
w w + (o y)x. ( and thc wciht- )
e e + [o y[. ( -um thc crror- )
end;
end;
until (e 0). ( rcjcat thc comjutation- )
end; ( unti thc crror vani-hc- )
Christian Borgelt Introduction to Neural Networks 29
Training Threshold Logic Units: Delta Rule
procedure atch trainin (var w, var , L, ).
var y, e, ( outjut, -um ot crror- )

c
, w
c
. ( -ummcd chanc- )
begin
repeat
e 0.
c
0. w
c

0. ( initiaization- )
for all (x, o) L do begin ( travcr-c thc jattcrn- )
if ( wx ) then y 1. ( comjutc thc outjut )
else y 0. ( ot thc thrc-hod oic unit )
if (y , o) then begin ( it thc outjut i- wron )

c

c
(o y). ( -um thc chanc- ot thc )
w
c
w
c
+ (o y)x. ( thrc-hod and thc wciht- )
e e + [o y[. ( -um thc crror- )
end;
end;
+
c
. ( adajt thc thrc-hod )
w w + w
c
. ( and thc wciht- )
until (e 0). ( rcjcat thc comjutation- )
end; ( unti thc crror vani-hc- )
Christian Borgelt Introduction to Neural Networks 30
Training Threshold Logic Units: Online
cjoch x o x w y e w w
1` 2
1 0 1 1.` 0 1 1 0 0` 2
1 0 1` 1 1 1 1 1` 1
2 0 1 1.` 0 1 1 0 0` 1
1 0 0` 1 1 1 1 1` 0
3 0 1 1.` 0 1 1 0 0` 0
1 0 0` 0 0 0 0 0` 0
! 0 1 0.` 0 1 1 0 0.` 0
1 0 0` 1 1 1 1 0` 1
` 0 1 0.` 0 1 1 0 0.` 1
1 0 0.` 0 0 0 0 0.` 1
o 0 1 0` 1 0 0 0 0.` 1
1 0 0.` 0 0 0 0 0.` 1
Christian Borgelt Introduction to Neural Networks 31
Training Threshold Logic Units: Batch
cjoch x o x w y e w w
1` 2
1 0 1 1.` 0 1 1 0
1 0 0` 1 1 1 1 1` 1
2 0 1 1.` 0 1 1 0
1 0 0.` 0 0 0 0 0` 1
3 0 1 0.` 0 1 1 0
1 0 0` 1 1 1 1 0` 0
! 0 1 0.` 0 1 1 0
1 0 0.` 0 0 0 0 0.` 0
` 0 1 0` 1 0 0 0
1 0 0` 1 1 1 1 0` 1
o 0 1 0.` 0 1 1 0
1 0 1.` 0 0 0 0 0.` 1
0 1 0` 1 0 0 0
1 0 0.` 0 0 0 0 0.` 1
Christian Borgelt Introduction to Neural Networks 32
Training Threshold Logic Units: Conjunction
Threshold logic unit with two inputs for the conjunction.

x
1
w
1
x
2
w
2
y
x
1
x
2
y
0 0 0
1 0 0
0 1 0
1 1 1
2
x
1
2
x
2
1
y
0 1
1
0
0
1
Christian Borgelt Introduction to Neural Networks 33
Training Threshold Logic Units: Conjunction
epoch x
1
x
2
o x w y e w
1
w
2
w
1
w
2
0 0 0
1 0 0 0 0 1 1 1 0 0 1 0 0
0 1 0 1 0 0 0 0 0 1 0 0
1 0 0 1 0 0 0 0 0 1 0 0
1 1 1 1 0 1 1 1 1 0 1 1
2 0 0 0 0 1 1 1 0 0 1 1 1
0 1 0 0 1 1 1 0 1 2 1 0
1 0 0 1 0 0 0 0 0 2 1 0
1 1 1 1 0 1 1 1 1 1 2 1
3 0 0 0 1 0 0 0 0 0 1 2 1
0 1 0 0 1 1 1 0 1 2 2 0
1 0 0 0 1 1 1 1 0 3 1 0
1 1 1 2 0 1 1 1 1 2 2 1
4 0 0 0 2 0 0 0 0 0 2 2 1
0 1 0 1 0 0 0 0 0 2 2 1
1 0 0 0 1 1 1 1 0 3 1 1
1 1 1 1 0 1 1 1 1 2 2 2
5 0 0 0 2 0 0 0 0 0 2 2 2
0 1 0 0 1 1 1 0 1 3 2 1
1 0 0 1 0 0 0 0 0 3 2 1
1 1 1 0 1 0 0 0 0 3 2 1
6 0 0 0 3 0 0 0 0 0 3 2 1
0 1 0 2 0 0 0 0 0 3 2 1
1 0 0 1 0 0 0 0 0 3 2 1
1 1 1 0 1 0 0 0 0 3 2 1
Christian Borgelt Introduction to Neural Networks 34
Training Threshold Logic Units: Biimplication
cjoch x
1
x
2
o x w y e w
1
w
2
w
1
w
2
0 0 0
1 0 0 1 0 1 0 0 0 0 0 0 0
0 1 0 0 1 1 1 0 1 1 0 1
1 0 0 1 0 0 0 0 0 1 0 1
1 1 1 2 0 1 1 1 1 0 1 0
2 0 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 1 1 1 0 1 1 1 1
1 0 0 0 1 1 1 1 0 2 0 1
1 1 1 3 0 1 1 1 1 1 1 0
3 0 0 1 0 1 0 0 0 0 0 1 0
0 1 0 0 1 1 1 0 1 1 1 1
1 0 0 0 1 1 1 1 0 2 0 1
1 1 1 3 0 1 1 1 1 1 1 0
Christian Borgelt Introduction to Neural Networks 35
Training Threshold Logic Units: Convergence
Convergence Theorem: Lct L (x
1
, o
1
), . . . (x
m
, o
m
) c a -ct ot trainin
jattcrn-, cach con-i-tin ot an injut vcctor x
i
ll
n
and a dc-ircd outjut o
i

0, 1 Iurthcrmorc, ct L
0
(x, o) L [ o 0 and L
1
(x, o) L [ o 1
lt L
0
and L
1
arc incary -cjarac, ic, it w ll
n
and ll cxi-t, -uch that
(x, 0) L
0
wx < and
(x, 1) L
1
wx ,
thcn oninc a- wc a- atch trainin tcrminatc
Thc aorithm- tcrminatc ony whcn thc crror vani-hc-
Thcrctorc thc rc-utin thrc-hod and wciht- mu-t -ovc thc jrocm
Ior not incary -cjarac jrocm- thc aorithm- do not tcrminatc
Christian Borgelt Introduction to Neural Networks 36
Training Networks of Threshold Logic Units
Sinc thrc-hod oic unit- havc -tron imitation-
Thcy can ony comjutc incary -cjarac tunction-
`ctwork- ot thrc-hod oic unit- can comjutc aritrary Loocan tunction-
Trainin -inc thrc-hod oic unit- with thc dcta ruc i- ta-t
and uarantccd to nd a -oution it onc cxi-t-
`ctwork- ot thrc-hod oic unit- cannot c traincd, ccau-c
thcrc arc no dc-ircd vauc- tor thc ncuron- ot thc r-t aycr,
thc jrocm can u-uay c -ovcd with dicrcnt tunction-
comjutcd y thc ncuron- ot thc r-t aycr
\hcn thi- -ituation ccamc ccar,
ncura nctwork- wcrc -ccn a- a rc-carch dcad cnd
Christian Borgelt Introduction to Neural Networks 37
General (Articial) Neural Networks
Christian Borgelt Introduction to Neural Networks 38
General Neural Networks
Basic graph theoretic notions
A (dircctcd) graph i- a jair G (V, E) con-i-tin ot a (nitc) -ct V ot nodes or
vertices and a (nitc) -ct E V V ot edges
\c ca an cdc e (u, v) E directed trom nodc u to nodc v
Lct G (V, E) c a (dircctcd) rajh and u V a nodc Thcn thc nodc- ot thc
-ct
jrcd(u) v V [ (v, u) E
arc cacd thc predecessors ot thc nodc u
and thc nodc- ot thc -ct
-ucc(u) v V [ (u, v) E
arc cacd thc successors ot thc nodc u
Christian Borgelt Introduction to Neural Networks 39
General Neural Networks
General denition of a neural network
An (articia) neural network i- a (dircctcd) rajh G (U, C),
who-c nodc- u U arc cacd neurons or units and
who-c cdc- c C arc cacd connections
Thc -ct U ot nodc- i- jartitioncd into
thc -ct U
in
ot input neurons,
thc -ct U
out
ot output neurons, and
thc -ct U
hiddcn
ot hidden neurons
lt i-
U U
in
U
out
U
hiddcn
,
U
in
, , U
out
, , U
hiddcn
(U
in
U
out
) .
Christian Borgelt Introduction to Neural Networks 40
General Neural Networks
Lach conncction (v, u) C jo--c--c- a weight w
uv
and
cach ncuron u U jo--c--c- thrcc (rca-vaucd) -tatc variac-
thc network input nct
u
,
thc activation act
u
, and
thc output out
u

Lach injut ncuron u U


in
a-o jo--c--c- a tourth (rca-vaucd) -tatc variac,
thc external input cx
u

Iurthcrmorc, cach ncuron u U jo--c--c- thrcc tunction-


thc network input function f
(u)
nct
ll
2[ jrcd(u)[+
1
(u)
ll,
thc activation function f
(u)
act
ll

2
(u)
ll, and
thc output function f
(u)
out
ll ll,
which arc u-cd to comjutc thc vauc- ot thc -tatc variac-
Christian Borgelt Introduction to Neural Networks 41
General Neural Networks
Types of (articial) neural networks
lt thc rajh ot a ncura nctwork i- acyclic,
it i- cacd a feed-forward network
lt thc rajh ot a ncura nctwork contain- cycles (ackward conncction-),
it i- cacd a recurrent network
Representation of the connection weights by a matrix
u
1
u
2
. . . u
r
_
_
_
_
_
_
w
u
1
u
1
w
u
1
u
2
. . . w
u
1
u
r
w
u
2
u
1
w
u
2
u
2
w
u
2
u
r

w
u
r
u
1
w
u
r
u
2
. . . w
u
r
u
r
_
_
_
_
_
_
u
1
u
2

u
r
Christian Borgelt Introduction to Neural Networks 42
General Neural Networks: Example
A simple recurrent neural network
u
1
u
2
u
3
x
1
x
2
y
1
!
2
3
Weight matrix of this network
u
1
u
2
u
3
_
_
_
0 0 !
1 0 0
2 3 0
_
_
_
u
1
u
2
u
3
Christian Borgelt Introduction to Neural Networks 43
Structure of a Generalized Neuron
A generalized neuron is a simple numeric processor
u
out
v
1
in
uv
1 d
d
d
d
d
w
uv
1
d
d
d
d
d

out
v
n
in
uv
n

w
uv
n

f
(u)
nct
E
nct
u
f
(u)
act
E
act
u
f
(u)
out
E
out
u

E
d
d
d
d
d
s
E
c
cx
u
T

1
, . . . ,
l
T

1
, . . . ,
k
Christian Borgelt Introduction to Neural Networks 44
General Neural Networks: Example
1
1
1
x
1
x
2
y
1
!
2
3
u
1
u
2
u
3
f
(u)
nct
( w
u
,

in
u
)

vjrcd(u)
w
uv
in
uv

vjrcd(u)
w
uv
out
v
f
(u)
act
(nct
u
, )
_
1, it nct
u
,
0, othcrwi-c
f
(u)
out
(act
u
) act
u
Christian Borgelt Introduction to Neural Networks 45
General Neural Networks: Example
Updating the activations of the neurons
u
1
u
2
u
3
injut jha-c 1 0 0
work jha-c 1 0 0 nct
u
3
2
0 0 0 nct
u
1
0
0 0 0 nct
u
2
0
0 0 0 nct
u
3
0
0 0 0 nct
u
1
0
Ordcr in which thc ncuron- arc ujdatcd
u
3
, u
1
, u
2
, u
3
, u
1
, u
2
, u
3
, . . .
A -tac -tatc with a uniquc outjut i- rcachcd
Christian Borgelt Introduction to Neural Networks 46
General Neural Networks: Example
Updating the activations of the neurons
u
1
u
2
u
3
injut jha-c 1 0 0
work jha-c 1 0 0 nct
u
3
2
1 1 0 nct
u
2
1
0 1 0 nct
u
1
0
0 1 1 nct
u
3
3
0 0 1 nct
u
2
0
1 0 1 nct
u
1
!
1 0 0 nct
u
3
2
Ordcr in which thc ncuron- arc ujdatcd
u
3
, u
2
, u
1
, u
3
, u
2
, u
1
, u
3
, . . .
`o -tac -tatc i- rcachcd (o-ciation ot outjut)
Christian Borgelt Introduction to Neural Networks 47
General Neural Networks: Training
Denition of learning tasks for a neural network
A xed learning task L
xcd
tor a ncura nctwork with
n injut ncuron-, ic U
in
u
1
, . . . , u
n
, and
m outjut ncuron-, ic U
out
v
1
, . . . , v
m
,
i- a -ct ot training patterns l (
(l)
, o
(l)
), cach con-i-tin ot
an input vector
(l)
( cx
(l)
u
1
, . . . , cx
(l)
u
n
) and
an output vector o
(l)
(o
(l)
v
1
, . . . , o
(l)
v
m
)
A xcd carnin ta-k i- -ovcd, it tor a trainin jattcrn- l L
xcd
thc ncura
nctwork comjutc- trom thc cxtcrna injut- containcd in thc injut vcctor
(l)
ot a
trainin jattcrn l thc outjut- containcd in thc corrc-jondin outjut vcctor o
(l)

Christian Borgelt Introduction to Neural Networks 48


General Neural Networks: Training
Solving a xed learning task: Error denition
`ca-urc how wc a ncura nctwork -ovc- a ivcn xcd carnin ta-k
Comjutc dicrcncc- ctwccn dc-ircd and actua outjut-
Lo not -um dicrcncc- dirccty in ordcr to avoid crror- canccin cach othcr
Squarc ha- tavorac jrojcrtic- tor dcrivin thc adajtation ruc-
e

lL
xed
e
(l)

vU
out
e
v

lL
xed

vU
out
e
(l)
v
,
whcrc e
(l)
v

_
o
(l)
v
out
(l)
v
_
2
Christian Borgelt Introduction to Neural Networks 49
General Neural Networks: Training
Denition of learning tasks for a neural network
A free learning task L
trcc
tor a ncura nctwork with
n injut ncuron-, ic U
in
u
1
, . . . , u
n
,
i- a -ct ot training patterns l (
(l)
), cach con-i-tin ot
an input vector
(l)
( cx
(l)
u
1
, . . . , cx
(l)
u
n
)
lrojcrtic-
Thcrc i- no dc-ircd outjut tor thc trainin jattcrn-
Outjut- can c cho-cn trccy y thc trainin mcthod
Soution idca Similar inputs should lead to similar outputs.
(cu-tcrin ot injut vcctor-)
Christian Borgelt Introduction to Neural Networks 50
General Neural Networks: Preprocessing
Normalization of the input vectors
Comjutc cxjcctcd vauc and -tandard dcviation tor cach injut

k

1
[L[

lL
cx
(l)
u
k
and
k

_
1
[L[

lL
_
cx
(l)
u
k

k
_
2
,
`ormaizc thc injut vcctor- to cxjcctcd vauc 0 and -tandard dcviation 1
cx
(l)(ncu)
u
k

cx
(l)(at)
u
k

k

k
Avoid- unit and -cain jrocm-
Christian Borgelt Introduction to Neural Networks 51
Multilayer Perceptrons (MLPs)
Christian Borgelt Introduction to Neural Networks 52
Multilayer Perceptrons
An r layer perceptron i- a ncura nctwork with a rajh G (U, C)
that -ati-c- thc toowin condition-
(i) U
in
U
out
,
(ii) U
hiddcn
U
(1)
hiddcn
U
(r2)
hiddcn
,
1 i < j r 2 U
(i)
hiddcn
U
(j)
hiddcn
,
(iii) C
_
U
in
U
(1)
hiddcn
_

r3
i1
U
(i)
hiddcn
U
(i+1)
hiddcn
_

_
U
(r2)
hiddcn
U
out
_
or, it thcrc arc no hiddcn ncuron- (r 2, U
hiddcn
),
C U
in
U
out

Iccd-torward nctwork with -tricty aycrcd -tructurc


Christian Borgelt Introduction to Neural Networks 53
Multilayer Perceptrons
General structure of a multilayer perceptron
x
1
x
2
x
n

U
in

U
(1)
hiddcn

U
(2)
hiddcn


U
(r2)
hiddcn
U
out

y
1
y
2
y
m
Christian Borgelt Introduction to Neural Networks 54
Multilayer Perceptrons
Thc nctwork injut tunction ot cach hiddcn ncuron and ot cach outjut ncuron
i- thc weighted sum ot it- injut-, ic
u U
hiddcn
U
out
f
(u)
nct
( w
u
,

in
u
) w
u

in
u

vjrcd (u)
w
uv
out
v
.
Thc activation tunction ot cach hiddcn ncuron i- a -o-cacd
sigmoid function, ic a monotonou-y incrca-in tunction
f ll |0, 1| with im
x
f(x) 0 and im
x
f(x) 1.
Thc activation tunction ot cach outjut ncuron i- cithcr a-o a -imoid tunction
or a linear function, ic
f
act
(nct, ) nct .
Christian Borgelt Introduction to Neural Networks 55
Sigmoid Activation Functions
-tcj tunction
f
act
(net, ) =
_
1, if net ,
0, otherwise.
net
1
2
1

-cmi-incar tunction
f
act
(net, ) =
_
1, if net > +
1
2
,
0, if net <
1
2
,
(net ) +
1
2
, otherwise.
net
1
2
1


1
2
+
1
2
-inc unti -aturation
f
act
(net, ) =
_
_
_
1, if net > +

2
,
0, if net <

2
,
sin(net )+1
2
, otherwise.
net
1
2
1



2
+

2
oi-tic tunction
f
act
(net, ) =
1
1 + e
(net )
net
1
2
1

8 4 + 4 + 8
Christian Borgelt Introduction to Neural Networks 56
Sigmoid Activation Functions
A -imoid tunction- on thc jrcviou- -idc arc unipolar,
ic, thcy ranc trom 0 to 1
Somctimc- bipolar -imoid tunction- arc u-cd,
ikc thc tangens hyperbolicus
tancn- hyjcroicu-
f
act
(nct, ) tanh(nct )

2
1 + e
2(nct )
1
net
1
0
1
4 2 + 2 + 4
Christian Borgelt Introduction to Neural Networks 57
Multilayer Perceptrons: Weight Matrices
Lct U
1
v
1
, . . . , v
m
and U
2
u
1
, . . . , u
n
c thc ncuron- ot two con-ccutivc
aycr- ot a mutiaycr jcrccjtron
Thcir conncction wciht- arc rcjrc-cntcd y an n m matrix
W
_
_
_
_
_
_
w
u
1
v
1
w
u
1
v
2
. . . w
u
1
v
m
w
u
2
v
1
w
u
2
v
2
. . . w
u
2
v
m

w
u
n
v
1
w
u
n
v
2
. . . w
u
n
v
m
_
_
_
_
_
_
,
whcrc w
u
i
v
j
0 it thcrc i- no conncction trom ncuron v
j
to ncuron u
i

Advantac Thc comjutation ot thc nctwork injut can c writtcn a-

nct
U
2
W

in
U
2
W

out
U
1
whcrc

nct
U
2
(nct
u
1
, . . . , nct
u
n
)

and

in
U
2


out
U
1
(out
v
1
, . . . , out
v
m
)

Christian Borgelt Introduction to Neural Networks 58


Multilayer Perceptrons: Biimplication
Solving the biimplication problem with a multilayer perceptron.
1
1
3
x
1
x
2
U
in
2
2
2
2
U
hiddcn
U
out
2
2
y
`otc thc additiona injut ncuron- comjarcd to thc TL -oution
W
1

_
2 2
2 2
_
and W
2

_
2 2
_
Christian Borgelt Introduction to Neural Networks 59
Multilayer Perceptrons: Fredkin Gate
s
x
1
x
2
s
y
1
y
2
0
a
b
0
a
b
1
a
b
1
b
a
s 0 0 0 0 1 1 1 1
x
1
0 0 1 1 0 0 1 1
x
2
0 1 0 1 0 1 0 1
y
1
0 0 1 1 0 1 0 1
y
2
0 1 0 1 0 0 1 1
x
1
x
2
s
y
1
x
1
x
2
s
y
2
Christian Borgelt Introduction to Neural Networks 60
Multilayer Perceptrons: Fredkin Gate
1
3
3
1
1
1
x
1
s
x
2
U
in
2
2
2
2
2
2
2
2
U
hiddcn
2
2
2
2
U
out
y
1
y
2
W
1

_
_
_
_
_
_
2 2 0
2 2 0
0 2 2
0 2 2
_
_
_
_
_
_
W
2

_
2 0 2 0
0 2 0 2
_
Christian Borgelt Introduction to Neural Networks 61
Why Non-linear Activation Functions?
\ith wciht matricc- wc havc tor two con-ccutivc aycr- U
1
and U
2

nct
U
2
W

in
U
2
W

out
U
1
.
lt thc activation tunction- arc incar, ic,
f
act
(nct, ) nct .
thc activation- ot thc ncuron- in thc aycr U
2
can c comjutcd a-

act
U
2
D
act


nct
U
2

,
whcrc


act
U
2
(act
u
1
, . . . , act
u
n
)

i- thc activation vcctor,


D
act
i- an n n diaona matrix ot thc tactor-
u
i
, i 1, . . . , n, and


(
u
1
, . . . ,
u
n
)

i- a ia- vcctor
Christian Borgelt Introduction to Neural Networks 62
Why Non-linear Activation Functions?
lt thc outjut tunction i- a-o incar, it i- anaoou-y

out
U
2
D
out


act
U
2

,
whcrc


out
U
2
(out
u
1
, . . . , out
u
n
)

i- thc outjut vcctor,


D
out
i- aain an n n diaona matrix ot tactor-, and


(
u
1
, . . . ,
u
n
)

a ia- vcctor
Cominin thc-c comjutation- wc ct

out
U
2
D
out

_
D
act

_
W

out
U
1
_

and thu-

out
U
2
A
12


out
U
1
+

b
12
with an n m matrix A
12
and an n-dimcn-iona vcctor

b
12

Christian Borgelt Introduction to Neural Networks 63


Why Non-linear Activation Functions?
Thcrctorc wc havc

out
U
2
A
12


out
U
1
+

b
12
and

out
U
3
A
23


out
U
2
+

b
23
tor thc comjutation- ot two con-ccutivc aycr- U
2
and U
3

Thc-c two comjutation- can c comincd into

out
U
3
A
13


out
U
1
+

b
13
,
whcrc A
13
A
23
A
12
and

b
13
A
23

b
12
+

b
23

Result: \ith incar activation and outjut tunction- any mutiaycr jcrccjtron
can c rcduccd to a two-aycr jcrccjtron
Christian Borgelt Introduction to Neural Networks 64
Multilayer Perceptrons: Function Approximation
General idea of function approximation
Ajjroximatc a ivcn tunction y a -tcj tunction
Con-truct a ncura nctwork that comjutc- thc -tcj tunction
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
Christian Borgelt Introduction to Neural Networks 65
Multilayer Perceptrons: Function Approximation
x
1
x
2
x
3
x
!
1
1
1
id
x

1
1
1
1

2
2
2
2
2
2

y
1
y
2
y
3

y
Christian Borgelt Introduction to Neural Networks 66
Multilayer Perceptrons: Function Approximation
Theorem: Any licmann-intcrac tunction can c ajjroximatcd with aritrary
accuracy y a tour-aycr jcrccjtron
Lut Lrror i- mca-urcd a- thc area ctwccn thc tunction-
`orc -ojhi-ticatcd mathcmatica cxamination aow- a -troncr a--crtion
\ith a thrcc-aycr jcrccjtron any continuou- tunction can c ajjroximatcd
with aritrary accuracy (crror maximum tunction vauc dicrcncc)
Christian Borgelt Introduction to Neural Networks 67
Multilayer Perceptrons: Function Approximation
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
y
1
y
2
y
3
y
!
Christian Borgelt Introduction to Neural Networks 68
Multilayer Perceptrons: Function Approximation
x
1
x
2
x
3
x
!
id
x

1
1
1
1

y
1
y
2
y
3
y
!

y
Christian Borgelt Introduction to Neural Networks 69
Multilayer Perceptrons: Function Approximation
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
0
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
3
3
3
3
3
3
y
1
3
3
3
3
3
3
y
2
3
3
3
3
3
3
y
3
3
3
3
3
3
3
y
!
Christian Borgelt Introduction to Neural Networks 70
Multilayer Perceptrons: Function Approximation

!
id
x

1
x
1
x
1
x
1
x

y
1
y
2
y
3
y
!

i

x
i
x
Christian Borgelt Introduction to Neural Networks 71
Mathematical Background: Regression
Christian Borgelt Introduction to Neural Networks 72
Mathematical Background: Linear Regression
Training neural networks is closely related to regression
Givcn A data-ct ((x
1
, y
1
), . . . , (x
n
, y
n
)) ot n data tujc- and
a hyjothc-i- aout thc tunctiona rcation-hij, c y g(x) a + bx
Ajjroach `inimizc thc -um ot -quarcd crror-, ic
F(a, b)
n

i1
(g(x
i
) y
i
)
2

i1
(a + bx
i
y
i
)
2
.
`ccc--ary condition- tor a minimum
F
a

n

i1
2(a + bx
i
y
i
) 0 and
F
b

n

i1
2(a + bx
i
y
i
)x
i
0
Christian Borgelt Introduction to Neural Networks 73
Mathematical Background: Linear Regression
lc-ut ot nccc--ary condition- Sy-tcm ot -o-cacd normal equations, ic
na +
_
_
n

i1
x
i
_
_
b
n

i1
y
i
,
_
_
n

i1
x
i
_
_
a +
_
_
n

i1
x
2
i
_
_
b
n

i1
x
i
y
i
.
Two incar cquation- tor two unknown- a and b
Sy-tcm can c -ovcd with -tandard mcthod- trom incar acra
Soution i- uniquc unc-- a x-vauc- arc idcntica
Thc rc-utin inc i- cacd a regression line
Christian Borgelt Introduction to Neural Networks 74
Linear Regression: Example
x 1 2 3 ! ` o S
y 1 3 2 3 ! 3 ` o
y
3
!
+

12
x.
x
y
0 1 2 3 ! ` o S
0
1
2
3
!
`
o
Christian Borgelt Introduction to Neural Networks 75
Mathematical Background: Polynomial Regression
Generalization to polynomials
y p(x) a
0
+ a
1
x + . . . + a
m
x
m
Ajjroach `inimizc thc -um ot -quarcd crror-, ic
F(a
0
, a
1
, . . . , a
m
)
n

i1
(p(x
i
) y
i
)
2

i1
(a
0
+ a
1
x
i
+ . . . + a
m
x
m
i
y
i
)
2
`ccc--ary condition- tor a minimum A jartia dcrivativc- vani-h, ic
F
a
0
0,
F
a
1
0, . . . ,
F
a
m
0.
Christian Borgelt Introduction to Neural Networks 76
Mathematical Background: Polynomial Regression
System of normal equations for polynomials
na
0
+
_
_
n

i1
x
i
_
_
a
1
+ . . . +
_
_
n

i1
x
m
i
_
_
a
m

n

i1
y
i
_
_
n

i1
x
i
_
_
a
0
+
_
_
n

i1
x
2
i
_
_
a
1
+ . . . +
_
_
n

i1
x
m+1
i
_
_
a
m

n

i1
x
i
y
i

_
_
n

i1
x
m
i
_
_
a
0
+
_
_
n

i1
x
m+1
i
_
_
a
1
+ . . . +
_
_
n

i1
x
2m
i
_
_
a
m

n

i1
x
m
i
y
i
,
m + 1 incar cquation- tor m + 1 unknown- a
0
, . . . , a
m

Sy-tcm can c -ovcd with -tandard mcthod- trom incar acra


Soution i- uniquc unc-- a x-vauc- arc idcntica
Christian Borgelt Introduction to Neural Networks 77
Mathematical Background: Multilinear Regression
Generalization to more than one argument
z f(x, y) a + bx + cy
Ajjroach `inimizc thc -um ot -quarcd crror-, ic
F(a, b, c)
n

i1
(f(x
i
, y
i
) z
i
)
2

i1
(a + bx
i
+ cy
i
z
i
)
2
`ccc--ary condition- tor a minimum A jartia dcrivativc- vani-h, ic
F
a

n

i1
2(a + bx
i
+ cy
i
z
i
) 0,
F
b

n

i1
2(a + bx
i
+ cy
i
z
i
)x
i
0,
F
c

n

i1
2(a + bx
i
+ cy
i
z
i
)y
i
0.
Christian Borgelt Introduction to Neural Networks 78
Mathematical Background: Multilinear Regression
System of normal equations for several arguments
na +
_
_
n

i1
x
i
_
_
b +
_
_
n

i1
y
i
_
_
c
n

i1
z
i
_
_
n

i1
x
i
_
_
a +
_
_
n

i1
x
2
i
_
_
b +
_
_
n

i1
x
i
y
i
_
_
c
n

i1
z
i
x
i
_
_
n

i1
y
i
_
_
a +
_
_
n

i1
x
i
y
i
_
_
b +
_
_
n

i1
y
2
i
_
_
c
n

i1
z
i
y
i
3 incar cquation- tor 3 unknown- a, b, and c
Sy-tcm can c -ovcd with -tandard mcthod- trom incar acra
Soution i- uniquc unc-- a x- or a y-vauc- arc idcntica
Christian Borgelt Introduction to Neural Networks 79
Multilinear Regression
General multilinear case:
y f(x
1
, . . . , x
m
) a
0
+
m

k1
a
k
x
k
Ajjroach `inimizc thc -um ot -quarcd crror-, ic
F(a) (Xa y)

(Xa y),
whcrc
X
_
_
_
1 x
11
. . . x
m1

1 x
1n
. . . x
mn
_
_
_, y
_
_
_
y
1

y
n
_
_
_, and a
_
_
_
_
_
_
a
0
a
1

a
m
_
_
_
_
_
_
`ccc--ary condition- tor a minimum

a
F(a)
a
(Xa y)

(Xa y)

0
Christian Borgelt Introduction to Neural Networks 80
Multilinear Regression

a
F(a) may ca-iy c comjutcd y rcmcmcrin that thc dicrcntia ojcrator

a

_

a
0
, . . . ,

a
m
_
chavc- tormay ikc a vcctor that i- mutijicd to thc -um ot -quarcd crror-
Atcrnativcy, onc may writc out thc dicrcntiation comjoncntwi-c
\ith thc tormcr mcthod wc otain tor thc dcrivativc

a
(Xa y)

(Xa y)
(
a
(Xa y))

(Xa y) + ((Xa y)

(
a
(Xa y)))

(
a
(Xa y))

(Xa y) + (
a
(Xa y))

(Xa y)
2X

(Xa y)
2X

Xa 2X

y

0
Christian Borgelt Introduction to Neural Networks 81
Multilinear Regression
`ccc--ary condition tor a minimum thcrctorc

a
F(a)
a
(Xa y)

(Xa y)
2X

Xa 2X

y
'


0
A- a con-cqucncc wc ct thc -y-tcm ot normal equations
X

Xa X

y
Thi- -y-tcm ha- a -oution it X

X i- not -inuar Thcn wc havc


a (X

X)
1
X

y.
(X

X)
1
X

i- cacd thc (`oorc-lcnro-c-)Pseudoinverse ot thc matrix X


\ith thc matrix-vcctor rcjrc-cntation ot thc rcrc--ion jrocm an cxtcn-ion to
multipolynomial regression i- -traihtorward
Simjy add thc dc-ircd jroduct- ot jowcr- to thc matrix X
Christian Borgelt Introduction to Neural Networks 82
Mathematical Background: Logistic Regression
Generalization to non-polynomial functions
Simjc cxamjc y ax
b
ldca Iind tran-tormation to incar,joynomia ca-c
Tran-tormation tor cxamjc n y n a + b n x.
Sjccia ca-c logistic function
y
Y
1 + e
a+bx

1
y

1 + e
a+bx
Y

Y y
y
e
a+bx
.
lc-ut Ajjy -o-cacd Logit-Transformation
n
_
Y y
y
_
a + bx.
Christian Borgelt Introduction to Neural Networks 83
Logistic Regression: Example
x 1 2 3 ! `
y 0! 10 30 `0 `o
Tran-torm thc data with
z n
_
Y y
y
_
, Y o.
Thc tran-tormcd data joint- arc
x 1 2 3 ! `
z 2o! 1o1 000 1.o1 2.o!
Thc rc-utin rcrc--ion inc i-
z 1.3`x + !.133.
Christian Borgelt Introduction to Neural Networks 84
Logistic Regression: Example
1 2 3 ! `
!
3
2
1
0
1
2
3
!
x
z
0
1
2
3
!
`
o
0 1 2 3 ! `
Y = 6
x
y
Thc oi-tic rcrc--ion tunction can c comjutcd y a -inc ncuron with
nctwork injut tunction f
nct
(x) wx with w 1.3`,
activation tunction f
act
(nct, ) (1 + e
(nct
))
1
with !.133 and
outjut tunction f
out
(act) o act
Christian Borgelt Introduction to Neural Networks 85
Training Multilayer Perceptrons
Christian Borgelt Introduction to Neural Networks 86
Training Multilayer Perceptrons: Gradient Descent
lrocm ot oi-tic rcrc--ion \ork- ony tor two-aycr jcrccjtron-
`orc cncra ajjroach gradient descent
`ccc--ary condition dierentiable activation and output functions
x
y
z
x
0
y
0
z
x
[
x
0
z
y
[
y
0

z[
(x
0
,y
0
)
lu-tration ot thc radicnt ot a rca-vaucd tunction z f(x, y) at a joint (x
0
, y
0
)
lt i-

z[
(x
0
,y
0
)

_
z
x
[
x
0
,
z
y
[
y
0
_

Christian Borgelt Introduction to Neural Networks 87


Gradient Descent: Formal Approach
General Idea Ajjroach thc minimum ot thc crror tunction in -ma -tcj-
Lrror tunction
e

lL
xed
e
(l)

vU
out
e
v

lL
xed

vU
out
e
(l)
v
,
Iorm radicnt to dctcrminc thc dircction ot thc -tcj

w
u
e
e
w
u

u
,
e
w
up
1
, . . . ,
e
w
up
n
_
.
Lxjoit thc -um ovcr thc trainin jattcrn-

w
u
e
e
w
u


w
u

lL
xed
e
(l)

lL
xed
e
(l)
w
u
.
Christian Borgelt Introduction to Neural Networks 88
Gradient Descent: Formal Approach
Sinc jattcrn crror dcjcnd- on wciht- ony throuh thc nctwork injut

w
u
e
(l)

e
(l)
w
u

e
(l)
nct
(l)
u
nct
(l)
u
w
u
.
Sincc nct
(l)
u
w
u

in
(l)
u
wc havc tor thc -ccond tactor
nct
(l)
u
w
u


in
(l)
u
.
Ior thc r-t tactor wc con-idcr thc crror e
(l)
tor thc trainin jattcrn l (
(l)
, o
(l)
)
e
(l)

vU
out
e
(l)
u

vU
out
_
o
(l)
v
out
(l)
v
_
2
,
ic thc -um ot thc crror- ovcr a outjut ncuron-
Christian Borgelt Introduction to Neural Networks 89
Gradient Descent: Formal Approach
Thcrctorc wc havc
e
(l)
nct
(l)
u

vU
out
_
o
(l)
v
out
(l)
v
_
2
nct
(l)
u

vU
out

_
o
(l)
v
out
(l)
v
_
2
nct
(l)
u
.
Sincc ony thc actua outjut out
(l)
v
ot an outjut ncuron v dcjcnd- on thc nctwork
injut nct
(l)
u
ot thc ncuron u wc arc con-idcrin, it i-
e
(l)
nct
(l)
u
2

vU
out
_
o
(l)
v
out
(l)
v
_
out
(l)
v
nct
(l)
u
. .

(l)
u
,
which a-o introducc- thc arcviation
(l)
u
tor thc imjortant -um ajjcarin hcrc
Christian Borgelt Introduction to Neural Networks 90
Gradient Descent: Formal Approach
Li-tinui-h two ca-c- Thc ncuron u i- an output neuron
Thc ncuron u i- a hidden neuron
ln thc r-t ca-c wc havc
u U
out

(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u
Thcrctorc wc havc tor thc radicnt
u U
out

w
u
e
(l)
u

e
(l)
u
w
u
2
_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u

in
(l)
u
and thu- tor thc wciht chanc
u U
out
w
(l)
u

w
u
e
(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
nct
(l)
u

in
(l)
u
.
Christian Borgelt Introduction to Neural Networks 91
Gradient Descent: Formal Approach
Lxact tormuac dcjcnd on choicc ot activation and outjut tunction,
-incc it i-
out
(l)
u
f
out
( act
(l)
u
) f
out
(f
act
( nct
(l)
u
)).
Con-idcr -jccia ca-c with
outjut tunction i- thc idcntity,
activation tunction i- oi-tic, ic f
act
(x)
1
1+e
x

Thc r-t a--umjtion yicd-


out
(l)
u
nct
(l)
u

act
(l)
u
nct
(l)
u
f
/
act
( nct
(l)
u
).
Christian Borgelt Introduction to Neural Networks 92
Gradient Descent: Formal Approach
Ior a oi-tic activation tunction wc havc
f
/
act
(x)
d
dx
_
1 + e
x
_
1

_
1 + e
x
_
2
_
e
x
_

1 + e
x
1
(1 + e
x
)
2

1
1 + e
x
_
1
1
1 + e
x
_
f
act
(x) (1 f
act
(x)),
and thcrctorc
f
/
act
( nct
(l)
u
) f
act
( nct
(l)
u
)
_
1 f
act
( nct
(l)
u
)
_
out
(l)
u
_
1 out
(l)
u
_
.
Thc rc-utin wciht chanc i- thcrctorc
w
(l)
u

_
o
(l)
u
out
(l)
u
_
out
(l)
u
_
1 out
(l)
u
_

in
(l)
u
,
which makc- thc comjutation- vcry -imjc
Christian Borgelt Introduction to Neural Networks 93
Error Backpropagation
Con-idcr now Thc ncuron u i- a hidden neuron, ic u U
k
, 0 < k < r 1
Thc outjut out
(l)
v
ot an outjut ncuron v dcjcnd- on thc nctwork injut nct
(l)
u
ony indirccty throuh it- -uccc--or ncuron- -ucc(u) s U [ (u, s) C
s
1
, . . . , s
m
U
k+1
, namcy throuh thcir nctwork injut- nct
(l)
s

\c ajjy thc chain ruc to otain

(l)
u

vU
out

s-ucc(u)
(o
(l)
v
out
(l)
v
)
out
(l)
v
nct
(l)
s
nct
(l)
s
nct
(l)
u
.
Lxchanin thc -um- yicd-

(l)
u

s-ucc(u)
_
_

vU
out
(o
(l)
v
out
(l)
v
)
out
(l)
v
nct
(l)
s
_
_
nct
(l)
s
nct
(l)
u

s-ucc(u)

(l)
s
nct
(l)
s
nct
(l)
u
.
Christian Borgelt Introduction to Neural Networks 94
Error Backpropagation
Con-idcr thc nctwork injut
nct
(l)
s
w
s

in
(l)
s

_
_
_

pjrcd(s)
w
sp
out
(l)
p
_
_
_
s
,
whcrc onc ccmcnt ot

in
(l)
s
i- thc outjut out
(l)
u
ot thc ncuron u Thcrctorc it i-
nct
(l)
s
nct
(l)
u

_
_
_

pjrcd(s)
w
sp
out
(l)
p
nct
(l)
u
_
_
_

s
nct
(l)
u
w
su
out
(l)
u
nct
(l)
u
,
Thc rc-ut i- thc rccur-ivc cquation (crror ackjrojaation)

(l)
u

_
_
_

s-ucc(u)

(l)
s
w
su
_
_
_
out
(l)
u
nct
(l)
u
.
Christian Borgelt Introduction to Neural Networks 95
Error Backpropagation
Thc rc-utin tormua tor thc wciht chanc i-
w
(l)
u

w
u
e
(l)

(l)
u

in
(l)
u

_
_
_

s-ucc(u)

(l)
s
w
su
_
_
_
out
(l)
u
nct
(l)
u

in
(l)
u
.
Con-idcr aain thc -jccia ca-c with
outjut tunction i- thc idcntity,
activation tunction i- oi-tic
Thc rc-utin tormua tor thc wciht chanc i- thcn
w
(l)
u

_
_
_

s-ucc(u)

(l)
s
w
su
_
_
_out
(l)
u
(1 out
(l)
u
)

in
(l)
u
.
Christian Borgelt Introduction to Neural Networks 96
Error Backpropagation: Cookbook Recipe
u U
in

out
(l)
u
cx
(l)
u
torward
jrojaation
u U
hiddcn
U
out

out
(l)
u

_
1 + cxj
_

pjrcd(u)
w
up
out
(l)
p
__
1
oi-tic
activation
tunction
imjicit
ia- vauc
x
1
x
2
x
n

y
1
y
2
y
m
u U
hiddcn

(l)
u

_

s-ucc(u)

(l)
s
w
su
_

(l)
u
ackward
jrojaation
u U
out

(l)
u

_
o
(l)
u
out
(l)
u
_

(l)
u
crror tactor

(l)
u
out
(l)
u
_
1 out
(l)
u
_
activation
dcrivativc
wciht
chanc
w
(l)
up

(l)
u
out
(l)
p
Christian Borgelt Introduction to Neural Networks 97
Gradient Descent: Examples
Gradient descent training for the negation x

x
w
y
x y
0 1
1 0
crror tor x 0
w

e
4
2
0
2
4
4
2
0
2
4
1
2
1
crror tor x 1
w

e
4
2
0
2
4
4
2
0
2
4
1
2
-um ot crror-
w

e
4
2
0
2
4
4
2
0
2
4
1
2
1
Christian Borgelt Introduction to Neural Networks 98
Gradient Descent: Examples
cjoch w crror
0 300 3`0 130
20 3 219 09So
!0 31 1S1 090
o0 3`0 1`3 09`S
S0 31` 12! 093
100 2` 0SS 0S90
120 1!S 02` 02`
1!0 0.0o 0.9S 0331
1o0 0.S0 2.0 01!9
1S0 1.19 2.! 00S
200 1.!! 3.20 00`9
220 1.o2 3.`! 00!!
Oninc Trainin
cjoch w crror
0 300 3`0 129`
20 3o 220 09S`
!0 30 1S2 090
o0 3!S 1`3 09`
S0 311 12` 093!
100 2!9 0SS 0SS0
120 12 022 0oo
1!0 0.21 1.0! 0292
1o0 0.So 2.0S 01!0
1S0 1.21 2.! 00S!
200 1.!` 3.19 00`S
220 1.o3 3.`3 00!!
Latch Trainin
Christian Borgelt Introduction to Neural Networks 99
Gradient Descent: Examples
Visualization of gradient descent for the negation x
Oninc Trainin

w
4 2 0 2 4
4
2
0
2
4
Latch Trainin

w
4 2 0 2 4
4
2
0
2
4
Latch Trainin
w

e
4
2
0
2
4
4
2
0
2
4
1
2
1
Trainin i- oviou-y -uccc--tu
Lrror cannot vani-h comjctcy duc to thc jrojcrtic- ot thc oi-tic tunction
Christian Borgelt Introduction to Neural Networks 100
Gradient Descent: Examples
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 0.200 3.112 11.1! 0.011
1 0.211 2.990 10.S11 0.011
2 0.222 2.S! 10.!90 0.010
3 0.232 2.oo 10.1S2 0.010
! 0.2!3 2.oo! 9.SSS 0.010
` 0.2`3 2.`oS 9.o0o 0.010
o 0.2o2 2.! 9.33` 0.009
0.21 2.391 9.0` 0.009
S 0.2S1 2.309 S.S2` 0.009
9 0.2S9 2.233 S.`S` 0.009
10 0.29S 2.1o0
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with initia vauc 0.2 and carnin ratc 0.001
Christian Borgelt Introduction to Neural Networks 101
Gradient Descent: Examples
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 1.`00 2.19 3.`00 0.S`
1 0.o2` 0.o`` 1.!31 0.3`S
2 0.9S3 0.9`` 2.``! 0.o39
3 0.3!! 1.S01 .1` 1.S9
! 2.13! !.12 0.`o 0.1!2
` 1.992 3.9S9 1.3S0 0.3!`
o 1.o! 3.203 3.0o3 0.oo
0.SS1 0.3! 1.`3 0.!3S
S 0.!!3 1.211 !.S`1 1.213
9 1.o`o 3.231 3.029 0.`
10 0.S9S 0.oo
6
5
4
3
2
1
0
0 1 2 3 4
start
Gradicnt dc-ccnt with initia vauc 1.` and carnin ratc 0.2`
Christian Borgelt Introduction to Neural Networks 102
Gradient Descent: Examples
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 2.o00 3.S1o 1.0 0.0S`
1 2.oS` 3.oo0 1.9! 0.09
2 2.S3 3.!o1 2.11o 0.10o
3 2.SSS 3.233 2.1`3 0.10S
! 2.99o 3.00S 2.009 0.100
` 3.09 2.S20 1.oSS 0.0S!
o 3.1S1 2.o9` 1.2o3 0.0o3
3.2!! 2.o2S 0.S!` 0.0!2
S 3.2So 2.`99 0.`1` 0.02o
9 3.312 2.`S9 0.293 0.01`
10 3.32 2.`S`
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with initia vauc 2.o and carnin ratc 0.0`
Christian Borgelt Introduction to Neural Networks 103
Gradient Descent: Variants
\ciht ujdatc ruc
w(t + 1) w(t) + w(t)
Standard backpropagation:
w(t)

w
e(t)
Manhattan training:
w(t) -n(
w
e(t)).
Momentum term:
w(t)

w
e(t) + w(t 1),
Christian Borgelt Introduction to Neural Networks 104
Gradient Descent: Variants
Self-adaptive error backpropagation:

w
(t)
_

_
c


w
(t 1), it
w
e(t)
w
e(t 1) < 0,
c
+

w
(t 1), it
w
e(t)
w
e(t 1) > 0

w
e(t 1)
w
e(t 2) 0,

w
(t 1), othcrwi-c
Resilient error backpropagation:
w(t)
_

_
c

w(t 1), it
w
e(t)
w
e(t 1) < 0,
c
+
w(t 1), it
w
e(t)
w
e(t 1) > 0

w
e(t 1)
w
e(t 2) 0,
w(t 1), othcrwi-c
Tyjica vauc- c

|0.`, 0.| and c


+
|1.0`, 1.2|
Christian Borgelt Introduction to Neural Networks 105
Gradient Descent: Variants
Quickpropagation
e
w
m w(t+1) w(t) w(t1)
e(t)
e(t1)
apex
w

w
e
w(t+1) w(t) w(t1)

w
e(t)

w
e(t1)
0 Thc wciht ujdatc ruc can c
dcrivcd trom thc trianc-
w(t)

w
e(t)

w
e(t 1)
w
e(t)
w(t 1).
Christian Borgelt Introduction to Neural Networks 106
Gradient Descent: Examples
cjoch w crror
0 300 3`0 129`
20 3o 220 09S`
!0 30 1S2 090
o0 3!S 1`3 09`
S0 311 12` 093!
100 2!9 0SS 0SS0
120 12 022 0oo
1!0 0.21 1.0! 0292
1o0 0.So 2.0S 01!0
1S0 1.21 2.! 00S!
200 1.!` 3.19 00`S
220 1.o3 3.`3 00!!
without momcntum tcrm
cjoch w crror
0 300 3`0 129`
10 3S0 219 09S!
20 3` 1S! 091
30 3`o 1`S 09o0
!0 32o 133 09!3
`0 29 10! 0910
o0 199 0o0 0S1!
0 0`! 0.2` 0!9
S0 0.`3 1.`1 0211
90 1.02 2.3o 0113
100 1.31 2.92 003
110 1.`2 3.31 00`3
120 1.o 3.o1 00!1
with momcntum tcrm
Christian Borgelt Introduction to Neural Networks 107
Gradient Descent: Examples
without momcntum tcrm

w
4 2 0 2 4
4
2
0
2
4
with momcntum tcrm

w
4 2 0 2 4
4
2
0
2
4
with momcntum tcrm
w

e
4
2
0
2
4
4
2
0
2
4
1
2
1
Lot- -how jo-ition cvcry 20 (without momcntum tcrm)
or cvcry 10 cjoch- (with momcntum tcrm)
Lcarnin with a momcntum tcrm i- aout twicc a- ta-t
Christian Borgelt Introduction to Neural Networks 108
Gradient Descent: Examples
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 0.200 3.112 11.1! 0.011
1 0.211 2.990 10.S11 0.021
2 0.232 2.1 10.19o 0.029
3 0.2o1 2.!SS 9.3oS 0.03`
! 0.29o 2.13 S.39 0.0!0
` 0.33 1.S`o .3!S 0.0!!
o 0.3S0 1.``9 o.2 0.0!o
0.!2o 1.29S `.22S 0.0!o
S 0.!2 1.09 !.23` 0.0!o
9 0.`1S 0.90 3.319 0.0!`
10 0.`o2 0.
6
5
4
3
2
1
0
0 1 2 3 4
radicnt dc-ccnt with momcntum tcrm ( 0.9)
Christian Borgelt Introduction to Neural Networks 109
Gradient Descent: Examples
Lxamjc tunction f(x)
`
o
x
!
x
3
+
11`
o
x
2
1Sx + o,
i x
i
f(x
i
) f
/
(x
i
) x
i
0 1.`00 2.19 3.`00 1.0`0
1 0.!`0 1.1S !.o99 0.0`
2 1.1`` 1.!o 3.39o 0.`09
3 0.o!` 0.o29 1.110 0.0S3
! 0.29 0.`S 0.02 0.00`
` 0.23 0.`S 0.001 0.000
o 0.23 0.`S 0.000 0.000
0.23 0.`S 0.000 0.000
S 0.23 0.`S 0.000 0.000
9 0.23 0.`S 0.000 0.000
10 0.23 0.`S
6
5
4
3
2
1
0
0 1 2 3 4
Gradicnt dc-ccnt with -ct-adajtin carnin ratc (c
+
1.2, c

0.`)
Christian Borgelt Introduction to Neural Networks 110
Other Extensions of Error Backpropagation
Flat Spot Elimination:
w(t)

w
e(t) +
Liminatc- -ow carnin in -aturation rcion ot oi-tic tunction
Countcract- thc dccay ot thc crror -ina- ovcr thc aycr-
Weight Decay:
w(t)

w
e(t) w(t),
Lcj- to imjrovc thc rou-tnc-- ot thc trainin rc-ut-
Can c dcrivcd trom an cxtcndcd crror tunction jcnaizin arc wciht-
e

e +

2

uU
out
U
hidden
_

2
u
+

pjrcd(u)
w
2
up
_
.
Christian Borgelt Introduction to Neural Networks 111
Sensitivity Analysis
Christian Borgelt Introduction to Neural Networks 112
Sensitivity Analysis
Question: Low imjortant arc dicrcnt injut- to thc nctwork
Idea: Lctcrminc chanc ot outjut rcativc to chanc ot injut
u U
in
s(u)
1
[L
xcd
[

lL
xed

vU
out
out
(l)
v
cx
(l)
u
.
Iorma dcrivation Ajjy chain ruc
out
v
cx
u

out
v
out
u
out
u
cx
u

out
v
nct
v
nct
v
out
u
out
u
cx
u
.
Simjication A--umc that thc outjut tunction i- thc idcntity
out
u
cx
u
1.
Christian Borgelt Introduction to Neural Networks 113
Sensitivity Analysis
Ior thc -ccond tactor wc ct thc cncra rc-ut
nct
v
out
u


out
u

pjrcd(v)
w
vp
out
p

pjrcd(v)
w
vp
out
p
out
u
.
Thi- cad- to thc rccur-ion tormua
out
v
out
u

out
v
nct
v
nct
v
out
u

out
v
nct
v

pjrcd(v)
w
vp
out
p
out
u
.
Lowcvcr, tor thc r-t hiddcn aycr wc ct
nct
v
out
u
w
vu
, thcrctorc
out
v
out
u

out
v
nct
v
w
vu
.
Thi- tormua mark- thc -tart ot thc rccur-ion
Christian Borgelt Introduction to Neural Networks 114
Sensitivity Analysis
Con-idcr a- u-ua thc -jccia ca-c with
outjut tunction i- thc idcntity,
activation tunction i- oi-tic
Thc rccur-ion tormua i- in thi- ca-c
out
v
out
u
out
v
(1 out
v
)

pjrcd(v)
w
vp
out
p
out
u
and thc anchor ot thc rccur-ion i-
out
v
out
u
out
v
(1 out
v
)w
vu
.
Christian Borgelt Introduction to Neural Networks 115
Demonstration Software: xmlp/wmlp
Lcmon-tration ot mutiaycr jcrccjtron trainin
\i-uaization ot thc trainin jrocc--
Liimjication and Lxcu-ivc Or, two continuou- tunction-
httj,,wwworctnct,mjdhtm
Christian Borgelt Introduction to Neural Networks 116
Multilayer Perceptron Software: mlp/mlpgui
Sottwarc tor trainin cncra mutiaycr jcrccjtron-
Command inc vcr-ion writtcn in C, ta-t trainin
Grajhica u-cr intcrtacc in !ava, ca-y to u-c
httj,,wwworctnct,mjhtm, httj,,wwworctnct,mjuihtm
Christian Borgelt Introduction to Neural Networks 117
Radial Basis Function Networks
Christian Borgelt Introduction to Neural Networks 118
Radial Basis Function Networks
A radial basis function network (RBFN) i- a ncura nctwork
with a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
in
U
out
,
(ii) C (U
in
U
hiddcn
) C
/
, C
/
(U
hiddcn
U
out
)
Thc nctwork injut tunction ot cach hiddcn ncuron i- a distance function
ot thc injut vcctor and thc wciht vcctor, ic
u U
hiddcn
f
(u)
nct
( w
u
,

in
u
) d( w
u
,

in
u
),
whcrc d ll
n
ll
n
ll
+
0
i- a tunction -ati-tyin x, y, z ll
n

(i) d(x, y) 0 x y,
(ii) d(x, y) d(y, x) (-ymmctry),
(iii) d(x, z) d(x, y) + d(y, z) (trianc incquaity).
Christian Borgelt Introduction to Neural Networks 119
Distance Functions
Illustration of distance functions
d
k
(x, y)
_
_
n

i1
(x
i
y
i
)
k
_
_
1
k
\c-known -jccia ca-c- trom thi- tamiy arc
k 1 `anhattan or city ock di-tancc,
k 2 Lucidcan di-tancc,
k maximum di-tancc, ic d

(x, y) max
n
i1
[x
i
y
i
[
k 1 k 2 k
Christian Borgelt Introduction to Neural Networks 120
Radial Basis Function Networks
Thc nctwork injut tunction ot thc outjut ncuron- i- thc wcihtcd -um ot thcir
injut-, ic
u U
out
f
(u)
nct
( w
u
,

in
u
) w
u

in
u

vjrcd (u)
w
uv
out
v
.
Thc activation tunction ot cach hiddcn ncuron i- a -o-cacd radial function, ic
a monotonou-y dccrca-in tunction
f ll
+
0
|0, 1| with f(0) 1 and im
x
f(x) 0.
Thc activation tunction ot cach outjut ncuron i- a incar tunction, namcy
f
(u)
act
(nct
u
,
u
) nct
u

u
.
(Thc incar activation tunction i- imjortant tor thc initiaization)
Christian Borgelt Introduction to Neural Networks 121
Radial Activation Functions
rcctanc tunction
f
act
(nct, )
_
0, it nct > ,
1, othcrwi-c
net
0
1

trianc tunction
f
act
(nct, )
_
0, it nct > ,
1
net

, othcrwi-c
net
0
1

co-inc unti zcro


f
act
(nct, )
_
0, it nct > 2,
cos(

2
net)+1
2
, othcrwi-c
net
0
1
2
1
2
Gau--ian tunction
f
act
(nct, ) e

net
2
2
2
net
0
1
2
e

1
2
e
2
Christian Borgelt Introduction to Neural Networks 122
Radial Basis Function Networks: Examples
Radial basis function networks for the conjunction x
1
x
2
1
2
0
x
1
x
2
1
1
1
y
0 1
1
0
x
1
x
2
6
5
1
x
1
x
2
0
0
1
y
0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 123
Radial Basis Function Networks: Examples
Radial basis function networks for the biimplication x
1
x
2
ldca oica dccomjo-ition
x
1
x
2
(x
1
x
2
) (x
1
x
2
)
1
2
1
2
0
x
1
x
2
1
1
0
0
1
1
y
0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 124
Radial Basis Function Networks: Function Approximation
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
y
!
0
1
y
3
0
1
y
2
0
1
y
1
Christian Borgelt Introduction to Neural Networks 125
Radial Basis Function Networks: Function Approximation

0
x

x
1
x
2
x
3
x
!

y
1
y
2
y
3
y
!

y

1
2
x
1
2
(x
i+1
x
i
)
Christian Borgelt Introduction to Neural Networks 126
Radial Basis Function Networks: Function Approximation
x
y
x
1
x
2
x
3
x
!
x
y
x
1
x
2
x
3
x
!
y
1
y
2
y
3
y
!
y
1
y
2
y
3
y
!
0
1
0
1
0
1
0
1
3
3
3
3
3
3

y
!
3
3
3
3
3
3

y
3
3
3
3
3
3
3

y
2
3
3
3
3
3
3

y
1
Christian Borgelt Introduction to Neural Networks 127
Radial Basis Function Networks: Function Approximation
x
y
2
1
0
1
2 ! o S
x
y
2
1
0
1
2 ! o S
0
1
w
1
0
1
w
2
0
1
w
3
Christian Borgelt Introduction to Neural Networks 128
Radial Basis Function Networks: Function Approximation
Radial basis function network for a sum of three Gaussian functions
x
2
`
o
1
1
1
1
3
2
0
y
Christian Borgelt Introduction to Neural Networks 129
Training Radial Basis Function Networks
Christian Borgelt Introduction to Neural Networks 130
Radial Basis Function Networks: Initialization
Lct L
xcd
l
1
, . . . , l
m
c a xcd carnin ta-k,
con-i-tin ot m trainin jattcrn- l (
(l)
, o
(l)
)
Simple radial basis function network
Onc hiddcn ncuron v
k
, k 1, . . . , m, tor cach trainin jattcrn
k 1, . . . , m w
v
k

(l
k
)
.
lt thc activation tunction i- thc Gau--ian tunction,
thc radii
k
arc cho-cn hcuri-ticay
k 1, . . . , m
k

d
max

2m
,
whcrc
d
max
max
l
j
,l
k
L
xed
d
_

(l
j
)
,
(l
k
)
_
.
Christian Borgelt Introduction to Neural Networks 131
Radial Basis Function Networks: Initialization
Initializing the connections from the hidden to the output neurons
u
m

k1
w
uv
m
out
(l)
v
m

u
o
(l)
u
or arcviatcd A w
u
o
u
,
whcrc o
u
(o
(l
1
)
u
, . . . , o
(l
m
)
u
)

i- thc vcctor ot dc-ircd outjut-,


u
0, and
A
_
_
_
_
_
_
_
_
out
(l
1
)
v
1
out
(l
1
)
v
2
. . . out
(l
1
)
v
m
out
(l
2
)
v
1
out
(l
2
)
v
2
. . . out
(l
2
)
v
m

out
(l
m
)
v
1
out
(l
m
)
v
2
. . . out
(l
m
)
v
m
_
_
_
_
_
_
_
_
.
Thi- i- a incar cquation -y-tcm, that can c -ovcd y invcrtin thc matrix A
w
u
A
1
o
u
.
Christian Borgelt Introduction to Neural Networks 132
RBFN Initialization: Example
Simple radial basis function network for the biimplication x
1
x
2
x
1
x
2
y
0 0 1
1 0 0
0 1 0
1 1 1
1
2
1
2
1
2
1
2
0
x
1
x
2
0
0
1
0
0
1
1
1
w
1
w
2
w
3
w
!
y
Christian Borgelt Introduction to Neural Networks 133
RBFN Initialization: Example
Simple radial basis function network for the biimplication x
1
x
2
A
_
_
_
_
_
_
1 e
2
e
2
e
!
e
2
1 e
!
e
2
e
2
e
!
1 e
2
e
!
e
2
e
2
1
_
_
_
_
_
_
A
1

_
_
_
_
_
_
_
_
a
D
b
D
b
D
c
D
b
D
a
D
c
D
b
D
b
D
c
D
a
D
b
D
c
D
b
D
b
D
a
D
_
_
_
_
_
_
_
_
whcrc
D 1 !e
!
+ oe
S
!e
12
+ e
1o
0.92S
a 1 2e
!
+ e
S
0.9o3
b e
2
+ 2e
o
e
10
0.130!
c e
!
2e
S
+ e
12
0.01
w
u
A
1
o
u

1
D
_
_
_
_
_
_
a + c
2b
2b
a + c
_
_
_
_
_
_

_
_
_
_
_
_
1.0`o
0.2S09
0.2S09
1.0`o
_
_
_
_
_
_
Christian Borgelt Introduction to Neural Networks 134
RBFN Initialization: Example
Simple radial basis function network for the biimplication x
1
x
2
-inc a-i- tunction
x
2
x
1
1
1
0
1
2
1
0
1
2
act
a a-i- tunction-
x
2
x
1
1
1
0
1
2
1
0
1
2
act
outjut
x
2
x
1
1
1
0
1
2
1
0
1
2
y
(1,0)
lnitiaization cad- arcady to a jcrtcct -oution ot thc carnin ta-k
Su-cqucnt trainin i- not nccc--ary
Christian Borgelt Introduction to Neural Networks 135
Radial Basis Function Networks: Initialization
Normal radial basis function networks:
Sccct -u-ct ot k trainin jattcrn- a- ccntcr-
A
_
_
_
_
_
_
_
_
1 out
(l
1
)
v
1
out
(l
1
)
v
2
. . . out
(l
1
)
v
k
1 out
(l
2
)
v
1
out
(l
2
)
v
2
. . . out
(l
2
)
v
k

1 out
(l
m
)
v
1
out
(l
m
)
v
2
. . . out
(l
m
)
v
k
_
_
_
_
_
_
_
_
A w
u
o
u
Comjutc (`oorclcnro-c) j-cudo invcr-c
A
+
(A

A)
1
A

.
Thc wciht- can thcn c comjutcd y
w
u
A
+
o
u
(A

A)
1
A

o
u
Christian Borgelt Introduction to Neural Networks 136
RBFN Initialization: Example
Normal radial basis function network for the biimplication x
1
x
2
Sccct two trainin jattcrn-
l
1
(
(l
1
)
, o
(l
1
)
) ((0, 0), (1))
l
!
(
(l
4
)
, o
(l
4
)
) ((1, 1), (1))
1
2
1
2

x
1
x
2
1
1
0
0
w
1
w
2
y
Christian Borgelt Introduction to Neural Networks 137
RBFN Initialization: Example
Normal radial basis function network for the biimplication x
1
x
2
A
_
_
_
_
_
_
1 1 e
!
1 e
2
e
2
1 e
2
e
2
1 e
!
1
_
_
_
_
_
_
A
+
(A

A)
1
A


_
_
_
a b b a
c d d e
e d d c
_
_
_
whcrc
a 0.1S10, b 0.oS10,
c 1.1S1, d 0.ooSS, e 0.1`9!.
lc-utin wciht-
w
u

_
_
_

w
1
w
2
_
_
_ A
+
o
u

_
_
_
0.3o20
1.33`
1.33`
_
_
_.
Christian Borgelt Introduction to Neural Networks 138
RBFN Initialization: Example
Normal radial basis function network for the biimplication x
1
x
2
a-i- tunction (0,0)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
a-i- tunction (1,1)
x
2
x
1
1
1
0
1
2
1
0
1
2
act
outjut
y
1
0
0.36
(1,0)
lnitiaization cad- arcady to a jcrtcct -oution ot thc carnin ta-k
Thi- i- an accidcnt, ccau-c thc incar cquation -y-tcm i- not ovcr-dctcrmincd,
duc to incary dcjcndcnt cquation-
Christian Borgelt Introduction to Neural Networks 139
Radial Basis Function Networks: Initialization
Finding appropriate centers for the radial basis functions
Onc ajjroach k-means clustering
Sccct randomy k trainin jattcrn- a- ccntcr-
A--in to cach ccntcr tho-c trainin jattcrn- that arc co-c-t to it
Comjutc ncw ccntcr- a- thc ccntcr ot ravity ot thc a--incd trainin jattcrn-
lcjcat jrcviou- two -tcj- unti convcrcncc,
ic, unti thc ccntcr- do not chanc anymorc
-c rc-utin ccntcr- tor thc wciht vcctor- ot thc hiddcn ncuron-
Atcrnativc ajjroach learning vector quantization
Christian Borgelt Introduction to Neural Networks 140
Radial Basis Function Networks: Training
Training radial basis function networks
Lcrivation ot ujdatc ruc- i- anaoou- to that ot mutiaycr jcrccjtron-
\ciht- trom thc hiddcn to thc outjut ncuron-
Gradicnt

w
u
e
(l)
u

e
(l)
u
w
u
2(o
(l)
u
out
(l)
u
)

in
(l)
u
,
\ciht ujdatc ruc
w
(l)
u

3
2

w
u
e
(l)
u

3
(o
(l)
u
out
(l)
u
)

in
(l)
u
(Two morc carnin ratc- arc nccdcd tor thc ccntcr coordinatc- and thc radii)
Christian Borgelt Introduction to Neural Networks 141
Radial Basis Function Networks: Training
Training radial basis function networks
Ccntcr coordinatc- (wciht- trom thc injut to thc hiddcn ncuron-)
Gradicnt

w
v
e
(l)

e
(l)
w
v
2

s-ucc(v)
(o
(l)
s
out
(l)
s
)w
su
out
(l)
v
nct
(l)
v
nct
(l)
v
w
v
\ciht ujdatc ruc
w
(l)
v

1
2

w
v
e
(l)

1

s-ucc(v)
(o
(l)
s
out
(l)
s
)w
sv
out
(l)
v
nct
(l)
v
nct
(l)
v
w
v
Christian Borgelt Introduction to Neural Networks 142
Radial Basis Function Networks: Training
Training radial basis function networks
Ccntcr coordinatc- (wciht- trom thc injut to thc hiddcn ncuron-)
Sjccia ca-c Euclidean distance
nct
(l)
v
w
v

_
_
n

i1
(w
vp
i
out
(l)
p
i
)
2
_
_

1
2
( w
v


in
(l)
v
).
Sjccia ca-c Gaussian activation function
out
(l)
v
nct
(l)
v

f
act
( nct
(l)
v
,
v
)
nct
(l)
v


nct
(l)
v
e

_
nct
(l)
v
_
2
2
2
v

nct
(l)
v

2
v
e

_
nct
(l)
v
_
2
2
2
v
.
Christian Borgelt Introduction to Neural Networks 143
Radial Basis Function Networks: Training
Training radial basis function networks
ladii ot radia a-i- tunction-
Gradicnt
e
(l)

v
2

s-ucc(v)
(o
(l)
s
out
(l)
s
)w
su
out
(l)
v

v
.
\ciht ujdatc ruc

(l)
v

2
2
e
(l)

v

2

s-ucc(v)
(o
(l)
s
out
(l)
s
)w
sv
out
(l)
v

v
.
Sjccia ca-c Gaussian activation function
out
(l)
v

v
e

_
nct
(l)
v
_
2
2
2
v

_
nct
(l)
v
_
2

3
v
e

_
nct
(l)
v
_
2
2
2
v
.
Christian Borgelt Introduction to Neural Networks 144
Radial Basis Function Networks: Generalization
Generalization of the distance function
ldca -c ani-otrojic di-tancc tunction
Lxamjc Mahalanobis distance
d(x, y)
_
(x y)

1
(x y).
Lxamjc biimplication
1
3
0
x
1
x
2
1
2
1
2
1
y
=
_
9 8
8 9
_
0 1
1
0
x
1
x
2
Christian Borgelt Introduction to Neural Networks 145
Learning Vector Quantization
Christian Borgelt Introduction to Neural Networks 146
Vector Quantization
Voronoi diagram of a vector quantization
Lot- rcjrc-cnt vcctor- that arc u-cd tor quantizin thc arca
Linc- arc thc oundaric- ot thc rcion- ot joint-
that arc co-c-t to thc cnco-cd vcctor
Christian Borgelt Introduction to Neural Networks 147
Learning Vector Quantization
Finding clusters in a given set of data points
Lata joint- arc rcjrc-cntcd y cmjty circc- ()
Cu-tcr ccntcr- arc rcjrc-cntcd y tu circc- ()
Christian Borgelt Introduction to Neural Networks 148
Learning Vector Quantization Networks
A learning vector quantization network (LVQ) i- a ncura nctwork
with a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
in
U
out
, U
hiddcn

(ii) C U
in
U
out
Thc nctwork injut tunction ot cach outjut ncuron i- a distance function
ot thc injut vcctor and thc wciht vcctor, ic
u U
out
f
(u)
nct
( w
u
,

in
u
) d( w
u
,

in
u
),
whcrc d ll
n
ll
n
ll
+
0
i- a tunction -ati-tyin x, y, z ll
n

(i) d(x, y) 0 x y,
(ii) d(x, y) d(y, x) (-ymmctry),
(iii) d(x, z) d(x, y) + d(y, z) (trianc incquaity).
Christian Borgelt Introduction to Neural Networks 149
Distance Functions
Illustration of distance functions
d
k
(x, y)
_
_
n

i1
(x
i
y
i
)
k
_
_
1
k
\c-known -jccia ca-c- trom thi- tamiy arc
k 1 `anhattan or city ock di-tancc,
k 2 Lucidcan di-tancc,
k maximum di-tancc, ic d

(x, y) max
n
i1
[x
i
y
i
[
k 1 k 2 k
Christian Borgelt Introduction to Neural Networks 150
Learning Vector Quantization
Thc activation tunction ot cach outjut ncuron i- a -o-cacd radial function, ic
a monotonou-y dccrca-in tunction
f ll
+
0
|0, | with f(0) 1 and im
x
f(x) 0.
Somctimc- thc ranc ot vauc- i- rc-trictcd to thc intcrva |0, 1|
Lowcvcr, duc to thc -jccia outjut tunction thi- rc-triction i- irrccvant
Thc outjut tunction ot cach outjut ncuron i- not a -imjc tunction ot thc activation
ot thc ncuron lathcr it takc- into account thc activation- ot a outjut ncuron-
f
(u)
out
(act
u
)
_
_
_
1, it act
u
max
vU
out
act
v
,
0, othcrwi-c
lt morc than onc unit ha- thc maxima activation, onc i- -ccctcd at random to havc
an outjut ot 1, a othcr- arc -ct to outjut 0 winner-takes-all principle
Christian Borgelt Introduction to Neural Networks 151
Radial Activation Functions
rcctanc tunction
f
act
(nct, )
_
0, it nct > ,
1, othcrwi-c
net
0
1

trianc tunction
f
act
(nct, )
_
0, it nct > ,
1
net

, othcrwi-c
net
0
1

co-inc unti zcro


f
act
(nct, )
_
0, it nct > 2,
cos(

2
net)+1
2
, othcrwi-c
net
0
1
2
1
2
Gau--ian tunction
f
act
(nct, ) e

net
2
2
2
net
0
1
2
e

1
2
e
2
Christian Borgelt Introduction to Neural Networks 152
Learning Vector Quantization
Adaptation of reference vectors / codebook vectors
Ior cach trainin jattcrn nd thc co-c-t rctcrcncc vcctor
Adajt ony thi- rctcrcncc vcctor (winncr ncuron)
Ior ca--icd data thc ca-- may c takcn into account
Lach rctcrcncc vcctor i- a--incd to a ca--
Attraction rule (data joint and rctcrcncc vcctor havc -amc ca--)
r
(ncw)
r
(od)
+ (x r
(od)
),
Repulsion rule (data joint and rctcrcncc vcctor havc dicrcnt ca--)
r
(ncw)
r
(od)
(x r
(od)
).
Christian Borgelt Introduction to Neural Networks 153
Learning Vector Quantization
Adaptation of reference vectors / codebook vectors
r
1
r
2
r
3
x
d
d
attraction ruc
r
1
r
2
r
3
x
d
d
rcju-ion ruc
x data joint, r
i
rctcrcncc vcctor
0.! (carnin ratc)
Christian Borgelt Introduction to Neural Networks 154
Learning Vector Quantization: Example
Adaptation of reference vectors / codebook vectors
Lctt Oninc trainin with carnin ratc 0.1,
liht Latch trainin with carnin ratc 0.0`
Christian Borgelt Introduction to Neural Networks 155
Learning Vector Quantization: Learning Rate Decay
Problem: xed learning rate can lead to oscillations
Soution time dependent learning rate
(t)
0

t
, 0 < < 1, or (t)
0
t

, > 0.
Christian Borgelt Introduction to Neural Networks 156
Learning Vector Quantization: Classied Data
Improved update rule for classied data
Idea: jdatc not ony thc onc rctcrcncc vcctor that i- co-c-t to thc data joint
(thc winncr ncuron), ut update the two closest reference vectors
Lct x c thc currcnty jrocc--cd data joint and c it- ca--
Lct r
j
and r
k
c thc two co-c-t rctcrcncc vcctor- and z
j
and z
k
thcir ca--c-
lctcrcncc vcctor- arc ujdatcd ony it z
j
, z
k
and cithcr c z
j
or c z
k

(\ithout o-- ot cncraity wc a--umc c z


j
)
Thc update rules tor thc two co-c-t rctcrcncc vcctor- arc
r
(ncw)
j
r
(od)
j
+ (x r
(od)
j
) and
r
(ncw)
k
r
(od)
k
(x r
(od)
k
),
whic a othcr rctcrcncc vcctor- rcmain unchancd
Christian Borgelt Introduction to Neural Networks 157
Learning Vector Quantization: Window Rule
lt wa- o-crvcd in jractica tc-t- that -tandard carnin vcctor quantization
may drivc thc rctcrcncc vcctor- turthcr and turthcr ajart
To countcract thi- undc-ircd chavior a window rule wa- introduccd
ujdatc ony it thc data joint x i- co-c to thc ca--ication oundary
Co-c to thc oundary i- madc tormay jrcci-c y rcquirin
min
_
d(x, r
j
)
d(x, r
k
)
,
d(x, r
k
)
d(x, r
j
)
_
> , whcrc
1
1 +
.
i- a jaramctcr that ha- to c -jccicd y a u-cr
lntuitivcy, dc-cric- thc width ot thc window around thc ca--ication
oundary, in which thc data joint ha- to ic in ordcr to cad to an ujdatc
-in it jrcvcnt- divcrcncc, ccau-c thc ujdatc cca-c- tor a data joint oncc
thc ca--ication oundary ha- ccn movcd tar cnouh away
Christian Borgelt Introduction to Neural Networks 158
Soft Learning Vector Quantization
Idea: -c -ott a--inmcnt- in-tcad ot winncr-takc--a
Assumption: Givcn data wa- -amjcd trom a mixturc ot norma di-triution-
Lach rctcrcncc vcctor dc-cric- onc norma di-triution
Objective: `aximizc thc o-ikcihood ratio ot thc data, that i-, maximizc
n L
ratio

n

j1
n

rR(c
j
)
cxj
_
_

(x
j
r)

(x
j
r)
2
2
_
_

j1
n

rQ(c
j
)
cxj
_
_

(x
j
r)

(x
j
r)
2
2
_
_
.
Lcrc i- a jaramctcr -jccityin thc -izc ot cach norma di-triution
R(c) i- thc -ct ot rctcrcncc vcctor- a--incd to ca-- c and Q(c) it- comjcmcnt
lntuitivcy at cach data joint thc jroaiity dcn-ity tor it- ca-- -houd c a- arc
a- jo--ic whic thc dcn-ity tor a othcr ca--c- -houd c a- -ma a- jo--ic
Christian Borgelt Introduction to Neural Networks 159
Soft Learning Vector Quantization
Update rule derived from a maximum log-likelihood approach:
r
(ncw)
i
r
(od)
i
+
_

_
u

ij
(x
j
r
(od)
i
), it c
j
z
i
,
u

ij
(x
j
r
(od)
i
), it c
j
, z
i
,
whcrc z
i
i- thc ca-- a--ociatcd with thc rctcrcncc vcctor r
i
and
u

ij

cxj (
1
2
2
(x
j
r
(od)
i
)

(x
j
r
(od)
i
))

rR(c
j
)
cxj (
1
2
2
(x
j
r
(od)
)

(x
j
r
(od)
))
and
u

ij

cxj (
1
2
2
(x
j
r
(od)
i
)

(x
j
r
(od)
i
))

rQ(c
j
)
cxj (
1
2
2
(x
j
r
(od)
)

(x
j
r
(od)
))
.
R(c) i- thc -ct ot rctcrcncc vcctor- a--incd to ca-- c and Q(c) it- comjcmcnt
Christian Borgelt Introduction to Neural Networks 160
Hard Learning Vector Quantization
Idea: Lcrivc a -chcmc with hard a--inmcnt- trom thc -ott vcr-ion
Approach: Lct thc -izc jaramctcr ot thc Gau--ian tunction o to zcro
Thc rc-utin ujdatc ruc i- in thi- ca-c
r
(ncw)
i
r
(od)
i
+
_

_
u

ij
(x
j
r
(od)
i
), it c
j
z
i
,
u

ij
(x
j
r
(od)
i
), it c
j
, z
i
,
whcrc
u

ij

_

_
1, it r
i
armin
rR(c
j
)
d(x
j
, r),
0, othcrwi-c,
u

ij

_

_
1, it r
i
armin
rQ(c
j
)
d(x
j
, r),
0, othcrwi-c
r
i
i- co-c-t vcctor ot -amc ca-- r
i
i- co-c-t vcctor ot dicrcnt ca--
Thi- ujdatc ruc i- -tac without a window rule rc-trictin thc ujdatc
Christian Borgelt Introduction to Neural Networks 161
Learning Vector Quantization: Extensions
Frequency Sensitive Competitive Learning
Thc di-tancc to a rctcrcncc vcctor i- modicd accordin to
thc numcr ot data joint- that arc a--incd to thi- rctcrcncc vcctor
Fuzzy Learning Vector Quantization
Lxjoit- thc co-c rcation-hij to tuzzy cu-tcrin
Can c -ccn a- an oninc vcr-ion ot tuzzy cu-tcrin
Lcad- to ta-tcr cu-tcrin
Size and Shape Parameters
A--ociatc cach rctcrcncc vcctor with a cu-tcr radiu-
jdatc thi- radiu- dcjcndin on how co-c thc data joint- arc
A--ociatc cach rctcrcncc vcctor with a covariancc matrix
jdatc thi- matrix dcjcndin on thc di-triution ot thc data joint-
Christian Borgelt Introduction to Neural Networks 162
Demonstration Software: xlvq/wlvq
Lcmon-tration ot carnin vcctor quantization
\i-uaization ot thc trainin jrocc--
Aritrary data-ct-, ut trainin ony in two dimcn-ion-
httj,,wwworctnct,vqdhtm
Christian Borgelt Introduction to Neural Networks 163
Self-Organizing Maps
Christian Borgelt Introduction to Neural Networks 164
Self-Organizing Maps
A self-organizing map or Kohonen feature map i- a ncura nctwork with
a rajh G (U, C) that -ati-c- thc toowin condition-
(i) U
hiddcn
, U
in
U
out
,
(ii) C U
in
U
out

Thc nctwork injut tunction ot cach outjut ncuron i- a distance function ot


injut and wciht vcctor Thc activation tunction ot cach outjut ncuron i- a radial
function, ic a monotonou-y dccrca-in tunction
f ll
+
0
|0, 1| with f(0) 1 and im
x
f(x) 0.
Thc outjut tunction ot cach outjut ncuron i- thc idcntity
Thc outjut i- ottcn di-crctizcd accordin to thc winner takes all jrincijc
On thc outjut ncuron- a neighborhood relationship i- dcncd
d
ncuron-
U
out
U
out
ll
+
0
.
Christian Borgelt Introduction to Neural Networks 165
Self-Organizing Maps: Neighborhood
Neighborhood of the output neurons: neurons form a grid
quadratic rid hcxaona rid
Thin ack inc- lndicatc ncarc-t ncihor- ot a ncuron
Thick ray inc- lndicatc rcion- a--incd to a ncuron tor vi-uaization
Christian Borgelt Introduction to Neural Networks 166
Topology Preserving Mapping
Images of points close to each other in the original space
should be close to each other in the image space.
Lxamjc Robinson projection ot thc -urtacc ot a -jhcrc
E
loin-on jro,cction i- trcqucnty u-cd tor word maj-
Christian Borgelt Introduction to Neural Networks 167
Self-Organizing Maps: Neighborhood
Find topology preserving mapping by respecting the neighborhood
lctcrcncc vcctor ujdatc ruc
r
(ncw)
u
r
(od)
u
+ (t) f
n
(d
ncuron-
(u, u

), (t)) (x r
(od)
u
),
u

i- thc winncr ncuron (rctcrcncc vcctor co-c-t to data joint)


Thc tunction f
n
i- a radia tunction
Timc dcjcndcnt carnin ratc
(t)
0

, 0 <

< 1, or (t)
0
t

> 0.
Timc dcjcndcnt ncihorhood radiu-
(t)
0

, 0 <

< 1, or (t)
0
t

> 0.
Christian Borgelt Introduction to Neural Networks 168
Self-Organizing Maps: Examples
Example: ntodin ot a two-dimcn-iona -ct-oranizin maj
Christian Borgelt Introduction to Neural Networks 169
Self-Organizing Maps: Examples
Example: ntodin ot a two-dimcn-iona -ct-oranizin maj
Christian Borgelt Introduction to Neural Networks 170
Self-Organizing Maps: Examples
Example: ntodin ot a two-dimcn-iona -ct-oranizin maj
Trainin a -ct-oranizin maj may tai it
thc (initia) carnin ratc i- cho-cn too -ma or
or thc (initia) ncihor i- cho-cn too -ma
Christian Borgelt Introduction to Neural Networks 171
Self-Organizing Maps: Examples
Example: ntodin ot a two-dimcn-iona -ct-oranizin maj
(a) () (c)
Sct-oranizin maj- that havc ccn traincd with random joint- trom
(a) a rotation jaraoa, () a -imjc cuic tunction, (c) thc -urtacc ot a -jhcrc
ln thi- ca-c oriina -jacc and imac -jacc havc dicrcnt dimcn-ionaity
Sct-oranizin maj- can c u-cd tor dimcn-ionaity rcduction
Christian Borgelt Introduction to Neural Networks 172
Demonstration Software: xsom/wsom
Lcmon-tration ot -ct-oranizin maj trainin
\i-uaization ot thc trainin jrocc--
Two-dimcn-iona arca- and thrcc-dimcn-iona -urtacc-
httj,,wwworctnct,-omdhtm
Christian Borgelt Introduction to Neural Networks 173
Hopeld Networks
Christian Borgelt Introduction to Neural Networks 174
Hopeld Networks
A Hopeld network i- a ncura nctwork with a rajh G (U, C) that -ati-c-
thc toowin condition-
(i) U
hiddcn
, U
in
U
out
U,
(ii) C U U (u, u) [ u U
ln a Lojcd nctwork a ncuron- arc injut a- wc a- outjut ncuron-
Thcrc arc no hiddcn ncuron-
Lach ncuron rcccivc- injut trom a othcr ncuron-
A ncuron i- not conncctcd to it-ct
Thc conncction wciht- arc -ymmctric, ic
u, v U, u , v w
uv
w
vu
.
Christian Borgelt Introduction to Neural Networks 175
Hopeld Networks
Thc nctwork injut tunction ot cach ncuron i- thc wcihtcd -um ot thc outjut- ot
a othcr ncuron-, ic
u U f
(u)
nct
( w
u
,

in
u
) w
u

in
u

vUu
w
uv
out
v
.
Thc activation tunction ot cach ncuron i- a thrc-hod tunction, ic
u U f
(u)
act
(nct
u
,
u
)
_
1, it nct
u
,
1, othcrwi-c
Thc outjut tunction ot cach ncuron i- thc idcntity, ic
u U f
(u)
out
(act
u
) act
u
.
Christian Borgelt Introduction to Neural Networks 176
Hopeld Networks
Alternative activation function
u U f
(u)
act
(nct
u
,
u
, act
u
)
_

_
1, it nct
u
> ,
1, it nct
u
< ,
act
u
, it nct
u

Thi- activation tunction ha- advantac- wrt thc jhy-ica intcrjrctation
ot a Lojcd nctwork
General weight matrix of a Hopeld network
W
_
_
_
_
_
_
0 w
u
1
u
2
. . . w
u
1
u
n
w
u
1
u
2
0 . . . w
u
2
u
n

w
u
1
u
n
w
u
1
u
n
. . . 0
_
_
_
_
_
_
Christian Borgelt Introduction to Neural Networks 177
Hopeld Networks: Examples
Very simple Hopeld network
0
0
x
1
x
2
u
1
u
2
1 1
y
1
y
2
W
_
0 1
1 0
_
Thc chavior ot a Lojcd nctwork can dcjcnd on thc ujdatc ordcr
Comjutation- can o-ciatc it ncuron- arc ujdatcd in jarac
Comjutation- away- convcrc it ncuron- arc ujdatcd -cqucntiay
Christian Borgelt Introduction to Neural Networks 178
Hopeld Networks: Examples
Parallel update of neuron activations
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
1 1
1 1
Thc comjutation- o-ciatc, no -tac -tatc i- rcachcd
Outjut dcjcnd- on whcn thc comjutation- arc tcrminatcd
Christian Borgelt Introduction to Neural Networks 179
Hopeld Networks: Examples
Sequential update of neuron activations
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
u
1
u
2
injut jha-c 1 1
work jha-c 1 1
1 1
1 1
1 1
lcardc-- ot thc ujdatc ordcr a -tac -tatc i- rcachcd
\hich -tatc i- rcachcd dcjcnd- on thc ujdatc ordcr
Christian Borgelt Introduction to Neural Networks 180
Hopeld Networks: Examples
Simplied representation of a Hopeld network
0
0
0
x
1
x
2
x
3
1 1
1 1 2
2
y
1
y
2
y
3
0
0
0
u
1
u
2
u
3
2
1
1
W
_
_
_
0 1 2
1 0 1
2 1 0
_
_
_
Symmctric conncction- ctwccn ncuron- arc comincd
lnjut- and outjut- arc not cxjicitcy rcjrc-cntcd
Christian Borgelt Introduction to Neural Networks 181
Hopeld Networks: State Graph
Graph of activation states and transitions
+++
++ ++ ++
+ + +

u
1
u
2
u
3
u
2
u
3
u
1
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
1
u
3
u
2
u
3
u
1
u
1
u
2
u
3
Christian Borgelt Introduction to Neural Networks 182
Hopeld Networks: Convergence
Convergence Theorem: lt thc activation- ot thc ncuron- ot a Lojcd nctwork
arc ujdatcd -cqucntiay (a-ynchronou-y), thcn a -tac -tatc i- rcachcd in a nitc
numcr ot -tcj-
lt thc ncuron- arc travcr-cd cycicay in an aritrary, ut xcd ordcr, at mo-t n 2
n
-tcj- (ujdatc- ot individua ncuron-) arc nccdcd, whcrc n i- thc numcr ot ncuron-
ot thc Lojcd nctwork
Thc jroot i- carricd out with thc hcj ot an energy function
Thc cncry tunction ot a Lojcd nctwork with n ncuron- u
1
, . . . , u
n
i-
E
1
2

act

W

act +

act

1
2

u,vU,u,v
w
uv
act
u
act
v
+

uU

u
act
u
.
Christian Borgelt Introduction to Neural Networks 183
Hopeld Networks: Convergence
Con-idcr thc cncry chanc rc-utin trom an ujdatc that chanc- an activation
E E
(ncw)
E
(od)
(

vUu
w
uv
act
(ncw)
u
act
v
+
u
act
(ncw)
u
)
(

vUu
w
uv
act
(od)
u
act
v
+
u
act
(od)
u
)

_
act
(od)
u
act
(ncw)
u
_
(

vUu
w
uv
act
v
. .
nct
u

u
).
nct
u
<
u
Sccond tactor i- c-- than 0
act
(ncw)
u
1 and act
(od)
u
1, thcrctorc r-t tactor rcatcr than 0
Result: E < 0
nct
u

u
Sccond tactor rcatcr than or cqua to 0
act
(ncw)
u
1 and act
(od)
u
1, thcrctorc r-t tactor c-- than 0
Result: E 0
Christian Borgelt Introduction to Neural Networks 184
Hopeld Networks: Examples
Arrange states in state graph according to their energy
+ + ++ ++
++ +
+++ !
2
0
2
E
Lncry tunction tor cxamjc Lojcd nctwork
E act
u
1
act
u
2
2 act
u
1
act
u
3
act
u
2
act
u
3
.
Christian Borgelt Introduction to Neural Networks 185
Hopeld Networks: Examples
The state graph need not be symmetric
1
1
1
u
1
u
2
u
3
2
2
2

+ +
++ ++
+ +++
++

1
1
3
`
E
Christian Borgelt Introduction to Neural Networks 186
Hopeld Networks: Physical Interpretation
Physical interpretation: Magnetism
A Lojcd nctwork can c -ccn a- a (micro-cojic) modc ot mancti-m
(-o-cacd l-in modc, |l-in 192`|)
jhy-ica ncura
atom ncuron
manctic momcnt (-jin) activation -tatc
-trcnth ot outcr manctic cd thrc-hod vauc
manctic coujin ot thc atom- conncction wciht-
Lamiton ojcrator ot thc manctic cd cncry tunction
Christian Borgelt Introduction to Neural Networks 187
Hopeld Networks: Associative Memory
Idea: Use stable states to store patterns
Iir-t Storc ony onc jattcrn x (act
(l)
u
1
, . . . , act
(l)
u
n
)

1, 1
n
, n 2,
ic, nd wciht-, -o that jattcrn i- a -tac -tatc
`ccc--ary and -ucicnt condition
S(Wx

) x,
whcrc
S ll
n
1, 1
n
,
x y
with
i 1, . . . , n y
i

_
1, it x
i
0,
1, othcrwi-c
Christian Borgelt Introduction to Neural Networks 188
Hopeld Networks: Associative Memory
lt

0 an ajjrojriatc matrix W can ca-iy c tound lt -ucc-


Wx cx with c ll
+

Acraicay Iind a matrix W that ha- a jo-itivc cicnvauc wrt x


Choo-c
W xx
T
E
whcrc xx
T
i- thc -o-cacd outer product
\ith thi- matrix wc havc
Wx (xx
T
)x Ex
..
x
()
x (x
T
x)
. .
[x[
2
n
x
nx x (n 1)x.
Christian Borgelt Introduction to Neural Networks 189
Hopeld Networks: Associative Memory
Hebbian learning rule |Lc 19!9|
\rittcn in individua wciht- thc comjutation ot thc wciht matrix rcad-
w
uv

_

_
0, it u v,
1, it u , v, act
(p)
u
act
(v)
u
,
1, othcrwi-c
Oriinay dcrivcd trom a iooica anaoy
Strcnthcn conncction ctwccn ncuron- that arc activc at thc -amc timc
`otc that thi- carnin ruc a-o -torc- thc comjcmcnt ot thc jattcrn
\ith Wx (n 1)x it i- a-o W(x ) (n 1)(x).
Christian Borgelt Introduction to Neural Networks 190
Hopeld Networks: Associative Memory
Storing several patterns
Choo-c
Wx
j

m

i1
W
i
x
j

_
_
m

i1
(x
i
x
T
i
)x
j
_
_
mEx
j
..
x
j

_
_
m

i1
x
i
(x
T
i
x
j
)
_
_
mx
j
lt jattcrn- arc orthoona, wc havc
x
T
i
x
j

_
0, it i , j,
n, it i j,
and thcrctorc
Wx
j
(n m)x
j
.
Christian Borgelt Introduction to Neural Networks 191
Hopeld Networks: Associative Memory
Storing several patterns
lc-ut A- on a- m < n, x i- a -tac -tatc ot thc Lojcd nctwork
`otc that thc comjcmcnt- ot thc jattcrn- arc a-o -torcd
\ith Wx
j
(n m)x
j
it i- a-o W(x
j
) (n m)(x
j
).
Lut Cajacity i- vcry -ma comjarcd to thc numcr ot jo--ic -tatc- (2
n
)
Non-orthogonal patterns:
Wx
j
(n m)x
j
+
m

i1
i,j
x
i
(x
T
i
x
j
)
. .
di-turancc tcrm
.
Christian Borgelt Introduction to Neural Networks 192
Associative Memory: Example
Lxamjc Storc jattcrn- x
1
(+1, +1, 1, 1)

and x
2
(1, +1, 1, +1)

W W
1
+ W
2
x
1
x
T
1
+ x
2
x
T
2
2E
whcrc
W
1

_
_
_
_
_
_
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
_
_
_
_
_
_
, W
2

_
_
_
_
_
_
0 1 1 1
1 0 1 1
1 1 0 1
1 1 1 0
_
_
_
_
_
_
.
Thc tu wciht matrix i-
W
_
_
_
_
_
_
0 0 0 2
0 0 2 0
0 2 0 0
2 0 0 0
_
_
_
_
_
_
.
Thcrctorc it i-
Wx
1
(+2, +2, 2, 2)

and Wx
1
(2, +2, 2, +2)

.
Christian Borgelt Introduction to Neural Networks 193
Associative Memory: Examples
Example: Storing bit maps of numbers
Lctt Lit maj- -torcd in a Lojcd nctwork
liht lccon-truction ot a jattcrn trom a random injut
Christian Borgelt Introduction to Neural Networks 194
Hopeld Networks: Associative Memory
Training a Hopeld network with the Delta rule
`ccc--ary condition tor jattcrn x cin a -tac -tatc
s(0 +w
u
1
u
2
act
(p)
u
2
+. . . +w
u
1
u
n
act
(p)
u
n

u
1
) act
(p)
u
1
,
s(w
u
2
u
1
act
(p)
u
1
+ 0 +. . . +w
u
2
u
n
act
(p)
u
n

u
2
) act
(p)
u
2
,

s(w
u
n
u
1
act
(p)
u
1
+w
u
n
u
2
act
(p)
u
2
+. . . + 0
u
n
) act
(p)
u
n
.
with thc -tandard thrc-hod tunction
s(x)
_
1, it x 0,
1, othcrwi-c
Christian Borgelt Introduction to Neural Networks 195
Hopeld Networks: Associative Memory
Training a Hopeld network with the Delta rule
Turn wciht matrix into a wciht vcctor
w ( w
u
1
u
2
, w
u
1
u
3
, . . . , w
u
1
u
n
,
w
u
2
u
3
, . . . , w
u
2
u
n
,

w
u
n1
u
n
,

u
1
,
u
2
, . . . ,
u
n
).
Con-truct injut vcctor- tor a thrc-hod oic unit
z
2
(act
(p)
u
1
, 0, . . . , 0,
. .
n 2 zcro-
act
(p)
u
3
, . . . , act
(p)
u
n
, . . . 0, 1, 0, . . . , 0
. .
n 2 zcro-
).
Ajjy Lcta ruc trainin unti convcrcncc
Christian Borgelt Introduction to Neural Networks 196
Demonstration Software: xhfn/whfn
Lcmon-tration ot Lojcd nctwork- a- a--ociativc mcmory
\i-uaization ot thc a--ociation,rcconition jrocc--
Two-dimcn-iona nctwork- ot aritrary -izc
httj,,wwworctnct,htndhtm
Christian Borgelt Introduction to Neural Networks 197
Hopeld Networks: Solving Optimization Problems
Use energy minimization to solve optimization problems
Gcncra jroccdurc
Tran-torm tunction to ojtimizc into a tunction to minimizc
Tran-torm tunction into thc torm ot an cncry tunction ot a Lojcd nctwork
lcad thc wciht- and thrc-hod vauc- trom thc cncry tunction
Con-truct thc corrc-jondin Lojcd nctwork
lnitiaizc Lojcd nctwork randomy and ujdatc unti convcrcncc
lcad -oution trom thc -tac -tatc rcachcd
lcjcat -cvcra timc- and u-c c-t -oution tound
Christian Borgelt Introduction to Neural Networks 198
Hopeld Networks: Activation Transformation
A Lojcd nctwork may c dcncd cithcr with activation- 1 and 1 or with acti-
vation- 0 and 1 Thc nctwork- can c tran-tormcd into cach othcr
Irom act
u
1, 1 to act
u
0, 1
w
0
uv
2w

uv
and

0
u

u
+

vUu
w

uv
Irom act
u
0, 1 to act
u
1, 1
w

uv

1
2
w
0
uv
and

u

0
u

1
2

vUu
w
0
uv
.
Christian Borgelt Introduction to Neural Networks 199
Hopeld Networks: Solving Optimization Problems
Combination lemma: Lct two Lojcd nctwork- on thc -amc -ct U ot ncuron-
with wciht- w
(i)
uv
, thrc-hod vauc-
(i)
u
and cncry tunction-
E
i

1
2

uU

vUu
w
(i)
uv
act
u
act
v
+

uU

(i)
u
act
u
,
i 1, 2, c ivcn Iurthcrmorc ct a, b ll Thcn E aE
1
+ bE
2
i- thc cncry
tunction ot thc Lojcd nctwork on thc ncuron- in U that ha- thc wciht- w
uv

aw
(1)
uv
+ bw
(2)
uv
and thc thrc-hod vauc-
u
a
(1)
u
+ b
(2)
u

lroot !u-t do thc comjutation-
ldca Additiona condition- can c tormaizcd -cjaratcy and incorjoratcd atcr
Christian Borgelt Introduction to Neural Networks 200
Hopeld Networks: Solving Optimization Problems
Example: Traveling salesman problem
ldca lcjrc-cnt tour y a matrix
1
3 !
2
city
1 2 3 !
_
_
_
_
_
_
1 0 0 0
0 0 1 0
0 0 0 1
0 1 0 0
_
_
_
_
_
_
1.
2.
3.
!.
-tcj
An ccmcnt a
ij
ot thc matrix i- 1 it thc i-th city i- vi-itcd in thc j-th -tcj and 0
othcrwi-c
Lach matrix ccmcnt wi c rcjrc-cntcd y a ncuron
Christian Borgelt Introduction to Neural Networks 201
Hopeld Networks: Solving Optimization Problems
Minimization of the tour length
E
1

n

j
1
1
n

j
2
1
n

i1
d
j
1
j
2
m
ij
1
m
(i mod n)+1,j
2
.
Louc -ummation ovcr -tcj- (indcx i) nccdcd
E
1

(i
1
,j
1
)1,...,n
2

(i
2
,j
2
)1,...,n
2
d
j
1
j
2

(i
1
mod n)+1,i
2
m
i
1
j
1
m
i
2
j
2
,
whcrc

ab

_
1, it a b,
0, othcrwi-c
Symmctric vcr-ion ot thc cncry tunction
E
1

1
2

(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
d
j
1
j
2
(
(i
1
mod n)+1,i
2
+
i
1
,(i
2
mod n)+1
) m
i
1
j
1
m
i
2
j
2
.
Christian Borgelt Introduction to Neural Networks 202
Hopeld Networks: Solving Optimization Problems
Additiona condition- that havc to c -ati-cd
Lach city i- vi-itcd on cxacty onc -tcj ot thc tour
j 1, . . . , n
n

i1
m
ij
1,
ic, cach coumn ot thc matrix contain- cxacty onc 1
On cach -tcj ot thc tour cxacty onc city i- vi-itcd
i 1, . . . , n
n

j1
m
ij
1,
ic, cach row ot thc matrix contain- cxacty onc 1
Thc-c condition- arc incorjoratcd y ndin additiona tunction- to ojtimizc
Christian Borgelt Introduction to Neural Networks 203
Hopeld Networks: Solving Optimization Problems
Iormaization ot r-t condition a- a minimization jrocm
E

2

n

j1
_
_
_
_
_
n

i1
m
ij
_
_
2
2
n

i1
m
ij
+ 1
_
_
_

j1
_
_
_
_
n

i
1
1
m
i
1
j
_
_
_
_
n

i
2
1
m
i
2
j
_
_
2
n

i1
m
ij
+ 1
_
_

j1
n

i
1
1
n

i
2
1
m
i
1
j
m
i
2
j
2
n

j1
n

i1
m
ij
+ n.
Louc -ummation ovcr citic- (indcx i) nccdcd
E
2

(i
1
,j
1
)1,...,n
2

(i
2
,j
2
)1,...,n
2

j
1
j
2
m
i
1
j
1
m
i
2
j
2
2

(i,j)1,...,n
2
m
ij
.
Christian Borgelt Introduction to Neural Networks 204
Hopeld Networks: Solving Optimization Problems
lc-utin cncry tunction
E
2

1
2

(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
2
j
1
j
2
m
i
1
j
1
m
i
2
j
2
+

(i,j)1,...,n
2
2m
ij
Sccond additiona condition i- handcd in a comjctcy anaoou- way
E
3

1
2

(i
1
,j
1
)1,...,n
2
(i
2
,j
2
)1,...,n
2
2
i
1
i
2
m
i
1
j
1
m
i
2
j
2
+

(i,j)1,...,n
2
2m
ij
.
Cominin thc cncry tunction-
E aE
1
+ bE
2
+ cE
3
whcrc
b
a

c
a
> 2 max
(j
1
,j
2
)1,...,n
2
d
j
1
j
2
.
Christian Borgelt Introduction to Neural Networks 205
Hopeld Networks: Solving Optimization Problems
Irom thc rc-utin cncry tunction wc can rcad thc wciht-
w
(i
1
,j
1
)(i
2
,j
2
)
ad
j
1
j
2
(
(i
1
mod n)+1,i
2
+
i
1
,(i
2
mod n)+1
)
. .
trom E
1
2b
j
1
j
2
. .
trom E
2
2c
i
1
i
2
. .
trom E
3
and thc thrc-hod vauc-

(i,j)
0a
..
trom E
1
2b
..
trom E
2
2c
..
trom E
3
2(b + c).
lrocm landom initiaization and ujdatc unti convcrcncc not away- cad- to
a matrix that rcjrc-cnt- a tour, cavc aonc an ojtima onc
Christian Borgelt Introduction to Neural Networks 206
Recurrent Neural Networks
Christian Borgelt Introduction to Neural Networks 207
Recurrent Networks: Cooling Law
A ody ot tcmjcraturc
0
that i- jaccd into an cnvironmcnt with tcmjcraturc
A

Thc cooin,hcatin ot thc ody can c dc-cricd y Newtons cooling law


d
dt


k(
A
).
Lxact anaytica -oution
(t)
A
+ (
0

A
)e
k(tt
0
)
Ajjroximatc -oution with Euler-Cauchy polygon courses

1
(t
1
) (t
0
) +

(t
0
)t
0
k(
0

A
)t.

2
(t
2
) (t
1
) +

(t
1
)t
1
k(
1

A
)t.
Gcncra rccur-ivc tormua

i
(t
i
) (t
i1
) +

(t
i1
)t
i1
k(
i1

A
)t
Christian Borgelt Introduction to Neural Networks 208
Recurrent Networks: Cooling Law
EulerCauchy polygon courses for dierent step widths
t

0
0 ` 10 1` 20
t

0
0 ` 10 1` 20
t

0
0 ` 10 1` 20
t ! t 2 t 1
Thc thin curvc i- thc cxact anaytica -oution
lccurrcnt ncura nctwork
(t
0
) (t) k
A
t
kt
Christian Borgelt Introduction to Neural Networks 209
Recurrent Networks: Cooling Law
`orc torma dcrivation ot thc rccur-ivc tormua
lcjacc dicrcntia quoticnt y forward dierence
d(t)
dt

(t)
t

(t + t) (t)
t
with -ucicnty -ma t Thcn it i-
(t + t) (t) (t) k((t)
A
)t,
(t + t) (t) (t) kt(t) + k
A
t
and thcrctorc

i

i1
kt
i1
+ k
A
t.
Christian Borgelt Introduction to Neural Networks 210
Recurrent Networks: Mass on a Spring
m
x
0
Govcrnin jhy-ica aw-
Hookes law F cl cx (c i- a -jrin dcjcndcnt con-tant)
Newtons second law F ma m x (torcc cau-c- an accccration)
lc-utin dicrcntia cquation
m x cx or x
c
m
x.
Christian Borgelt Introduction to Neural Networks 211
Recurrent Networks: Mass on a Spring
Gcncra anaytica -oution ot thc dicrcntia cquation
x(t) a -in(t) + b co-(t)
with thc jaramctcr-

_
c
m
,
a x(t
0
) -in(t
0
) + v(t
0
) co-(t
0
),
b x(t
0
) co-(t
0
) v(t
0
) -in(t
0
).
\ith ivcn initia vauc- x(t
0
) x
0
and v(t
0
) 0 and
thc additiona a--umjtion t
0
0 wc ct thc -imjc cxjrc--ion
x(t) x
0
co-
__
c
m
t
_
.
Christian Borgelt Introduction to Neural Networks 212
Recurrent Networks: Mass on a Spring
Turn dicrcntia cquation into two coujcd cquation-
x v and v
c
m
x.
Ajjroximatc dicrcntia quoticnt y torward dicrcncc
x
t

x(t + t) x(t)
t
v and
v
t

v(t + t) v(t)
t

c
m
x
lc-utin rccur-ivc cquation-
x(t
i
) x(t
i1
) + x(t
i1
) x(t
i1
) + t v(t
i1
) and
v(t
i
) v(t
i1
) + v(t
i1
) v(t
i1
)
c
m
t x(t
i1
).
Christian Borgelt Introduction to Neural Networks 213
Recurrent Networks: Mass on a Spring
0
0
x(t
0
)
v(t
0
)
x(t)
v(t)
t
c
m
t
u
2
u
1
`curon u
1
f
(u
1
)
nct
(v, w
u
1
u
2
) w
u
1
u
2
v
c
m
t v and
f
(u
1
)
act
(act
u
1
, nct
u
1
,
u
1
) act
u
1
+nct
u
1

u
1
,
`curon u
2
f
(u
2
)
nct
(x, w
u
2
u
1
) w
u
2
u
1
x t x and
f
(u
2
)
act
(act
u
2
, nct
u
2
,
u
2
) act
u
2
+nct
u
2

u
2
.
Christian Borgelt Introduction to Neural Networks 214
Recurrent Networks: Mass on a Spring
Somc comjutation -tcj- ot thc ncura nctwork
t v x
0.0 0.0000 1.0000
0.1 0.`000 0.9`00
0.2 0.9`0 0.S`2`
0.3 1.!012 0.12!
0.! 1.`! 0.`3oo
0.` 2.02`S 0.33!1
0.o 2.192S 0.11!S
x
t
1 2 3 4
Thc rc-utin curvc i- co-c to thc anaytica -oution
Thc ajjroximation ct- cttcr with -macr -tcj width
Christian Borgelt Introduction to Neural Networks 215
Recurrent Networks: Dierential Equations
General representation of explicit n-th order dierential equation:
x
(n)
f(t, x, x, x, . . . , x
(n1)
)
lntroducc n 1 intcrmcdiary quantitic-
y
1
x, y
2
x, . . . y
n1
x
(n1)
to otain thc -y-tcm
x y
1
,
y
1
y
2
,

y
n2
y
n1
,
y
n1
f(t, x, y
1
, y
2
, . . . , y
n1
)
ot n coujcd r-t ordcr dicrcntia cquation-
Christian Borgelt Introduction to Neural Networks 216
Recurrent Networks: Dierential Equations
lcjacc dicrcntia quoticnt y torward di-tancc to otain thc rccur-ivc cquation-
x(t
i
) x(t
i1
) + t y
1
(t
i1
),
y
1
(t
i
) y
1
(t
i1
) + t y
2
(t
i1
),

y
n2
(t
i
) y
n2
(t
i1
) + t y
n3
(t
i1
),
y
n1
(t
i
) y
n1
(t
i1
) + f(t
i1
, x(t
i1
), y
1
(t
i1
), . . . , y
n1
(t
i1
))
Lach ot thc-c cquation- dc-cric- thc ujdatc ot onc ncuron
Thc a-t ncuron nccd- a -jccia activation tunction
Christian Borgelt Introduction to Neural Networks 217
Recurrent Networks: Dierential Equations
x
0
x
0
x
0

x
(n1)
0
t
0
0
0
0

t
x(t)
t
t
t
t
Christian Borgelt Introduction to Neural Networks 218
Recurrent Networks: Diagonal Throw
y
x
y
0
x
0

v
0
co-
v
0
-in Liaona throw ot a ody
Two dicrcntia cquation- (onc tor cach coordinatc)
x 0 and y g,
whcrc g 9.S1 m-
2

lnitia condition- x(t


0
) x
0
, y(t
0
) y
0
, x(t
0
) v
0
co- and y(t
0
) v
0
-in
Christian Borgelt Introduction to Neural Networks 219
Recurrent Networks: Diagonal Throw
lntroducc intcrmcdiary quantitic-
v
x
x and v
y
y
to rcach thc -y-tcm ot dicrcntia cquation-
x v
x
, v
x
0,
y v
y
, v
y
g,
trom which wc ct thc -y-tcm ot rccur-ivc ujdatc tormuac
x(t
i
) x(t
i1
) + t v
x
(t
i1
), v
x
(t
i
) v
x
(t
i1
),
y(t
i
) y(t
i1
) + t v
y
(t
i1
), v
y
(t
i
) v
y
(t
i1
) t g.
Christian Borgelt Introduction to Neural Networks 220
Recurrent Networks: Diagonal Throw
Lcttcr dc-crijtion -c vectors a- injut- and outjut-

r ge
y
,
whcrc e
y
(0, 1)
lnitia condition- arc r(t
0
) r
0
(x
0
, y
0
) and

r(t
0
) v
0
(v
0
co- , v
0
-in )
lntroducc onc vector-valued intcrmcdiary quantity v

r to otain

r v,

v ge
y
Thi- cad- to thc rccur-ivc ujdatc ruc-
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t ge
y
Christian Borgelt Introduction to Neural Networks 221
Recurrent Networks: Diagonal Throw
Advantac ot vcctor nctwork- ccomc- oviou- it triction i- takcn into account
a v

r
i- a con-tant that dcjcnd- on thc -izc and thc -hajc ot thc ody
Thi- cad- to thc dicrcntia cquation

r ge
y
.
lntroducc thc intcrmcdiary quantity v

r to otain

r v,

v v ge
y
,
trom which wc otain thc rccur-ivc ujdatc tormuac
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t v(t
i1
) t ge
y
.
Christian Borgelt Introduction to Neural Networks 222
Recurrent Networks: Diagonal Throw
lc-utin rccurrcnt ncura nctwork
r
0
v
0

0
tge
y
t
r(t)
t
x
y
1 2 3
Thcrc arc no -tranc coujin- a- thcrc woud c in a non-vcctor nctwork
`otc thc dcviation trom a jaraoa that i- duc to thc triction
Christian Borgelt Introduction to Neural Networks 223
Recurrent Networks: Planet Orbit

r m
r
[r [
3
,

r v,

v m
r
[r [
3
.
lccur-ivc ujdatc ruc-
r(t
i
) r(t
i1
) + t v(t
i1
),
v(t
i
) v(t
i1
) t m
r(t
i1
)
[r(t
i1
)[
3
,
r
0
v
0

0
x(t)
v(t)
t mt
x
y
1 0.5 0 0.5
0.5
Christian Borgelt Introduction to Neural Networks 224
Recurrent Networks: Backpropagation through Time
ldca ntod thc nctwork ctwccn trainin jattcrn-,
ic, crcatc onc ncuron tor cach joint in timc
Lxamjc Newtons cooling law
(t
0
)

(t)
1kt 1kt 1kt 1kt
ntodin into tour -tcj- lt i- k
A
t
Trainin i- -tandard ackjrojaation on untodcd nctwork
A ujdatc- rctcr to thc -amc wciht
ujdatc- arc carricd out attcr r-t ncuron i- rcachcd
Christian Borgelt Introduction to Neural Networks 225