Ch17 Presn PDF

17
Radial Basis Networks
1
17 Radial Basis Network
Inputs Radial Basis Layer Linear Layer
W1
1
S xR
p1 a1 a2
1
R x1
||dist|| 1
S x1
W2 S2x1
n n
1 2
S2xS1
.* S1x1 2
S x1
1 b
1
1 b
2
1 1 2
R S1x1 S S2x1 S
a i = radbas(||iw -p||b i) a = W a +b
1 1 1 2 2 1 2
n pi w b
1
i
1 1
i b 1/ 2 a f n e n2
The first layer weight vectors iw1 are called centers of

the basis functions.
2
17 Gaussian Transfer Function (Local)
1.0
0.8
a f n e n2
0.6
a
0.4
0.2
3 2 1 0 1 2 3
n
3
17 Example Network Function
w11,1 1, w12,1 1, b11 2, b21 2

w12,1 1, w12,2 1, b 2 0
2
a2
0
1
2 1 0 1 2
p
4
17 Parameter Variations
2 2
0.5 b21 8 0 w21,1 2

1 1
0 0
1 1
2 1 0 1 2 2 1 0 1 2
2 2
1 b2 1
1 w12,1 1
1 1
0 0
1 1
2 1 0 1 2 2 1 0 1 2
5
17 Pattern Recognition Problem
p2 p4
p1 p3
1 1 1 1
Category 1 : p 2 , p3 Category 2 : p1 , p 4
1 1 1 1
6
17 Radial Basis Solution
Choose centers at p2 and p3: Choose bias to be 1:
p T
2 1 1 1
W T
1
b
1
p
3 1 1 1
This will cause the following reduction
in the basis functions where they meet:
1 ae n 2
e 1 2 2 e 2 0.1353
0.5
0.5
Choose the second layer bias to
1 produce negative outputs, unless we
1.5
are near p2 and p3. Choose second
2
3
2
layer weights so that output moves
above 0 near p2 and p3.
3
1 2
0 1
1 0
1
2
W 2 2 2, b 2 1
2
3 3
7
17 Final Decision Regions
3
3 2 1 0 1 2 3
8
17 Global Versus Local
Multilayer networks create a distributed representation.

All sigmoid or linear transfer functions overlap in their activity.
Radial basis networks create local representations.
Each basis function is only active over a small region.
The global approach requires fewer neurons. The local
approach is susceptible to the curse of dimensionality.
The local approach leads to faster training and is suitable
for adaptive methods.
9
17 Radial Basis Training
Radial basis network training generally consists of two

stages.
During the first stage, the weights and biases in the first
layer are set. This can involve unsupervised training or
even random selection of the weights.
The weights and biases in the second layer are found
during the second stage. This usually involves linear least
squares, or LMS for adaptive training.
Backpropagation (gradient-based) algorithms can also be
used for radial basis networks.
10
17 Assume Fixed First Layer
We begin with the case where the first layer weights (centers) are
fixed. Assume they are set on a grid, or randomly set. For random
weights, the bias can be S1
bi1
d max
The training data is given by
p1, t1, p2 , t 2 , , pQ , tQ
With first layer weights and biases fixed, the first layer output can
be computed:
ni1,q pq i w1 bi1 a1q radbasn1q
This provides a training set for the second layer:

a , t , a , t , , a
1
1 1
1
2 2
1
Q , tQ
11
17 Linear Least Squares (2nd Layer)
Q
a W a b
2 2 1 2 F x t q a q t q a2q
2 T
q 1
1 w2 a1q
x 2 zq
b 1
aq2 1 w2 a1q b2 xT z q
T
Q
F x t q x z q t q xT z q
T T
q 1
12
17 Matrix Form
t1 1 uT z1T e1
t T T e
2u z2
t e
2
2
U

T T
tQ Q u zQ eQ
F x t Ux t Ux
T
e t Ux
n
F x t Ux t Ux
T
xi2 t Ux T t Ux x T x
i 1
t T t 2t T Ux x T UT Ux x T x

t T t 2t T Ux x T UT U I x
13
17 Linear Least Squares Solution

F x t T t 2t T Ux x T UT U I x
1 T
c dT x x Ax (Quadratic Function)
2
1 T
F x c d x x Ax d Ax
T
2

2UT t 2 UT U I x 0
U U Ix
T *
UT t
14
17 Example (1)

g p 1 sin p for 2 p 2
4
p 2, 1.2, 0.4, 0.4, 1.2, 2
t 0, 0.19, 0.69, 1.3, 1.8, 2
2 0.5
W1 0 , b1 0.5

2 0.5
15
17 Example (2)
ni1,q pq i w1 bi1 a1q radbasn1q
1 0.852 0.527 0.237 0.077 0.018

a 0.368 , 0.698 , 0.961 , 0.961 , 0.698 , 0.368
1

0.018 0.077 0.237 0.527 0.852 1

1 0.852 0.527 0.237 0.077 0.018

0.368 0.698 0.961 0.961 0.698 0.368
UT
0.018 0.077 0.237 0.527 0.852 1

1 1 1 1 1 1
t T 0 0.19 0.69 1.3 1.8 2
16
17 Example (3)
1
x * U T U I U T t
1
2.07 1.76 0.42 2.71 1.01 1.03
1.76 3.09 1.76 4.05 4.05 0
x*
0.42 1.76 2.07 2.71 4.41 1.03

2.71 4.05 2.71 6 6 1
W 2 1.03 0 1.03 b 2 1
17
17 Example (4)
2.5
1.5
a2 0.5
0.5
1.5
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
a1 0.5
0
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1
p
18
17 Bias Too Large
1.5
a2 8
b1 8
0.5

0
8
0.5
1
2 1.5 1 0.5 0 0.5 1 1.5 2
a1 0.5
0
2 1.5 1 0.5 0 0.5 1 1.5 2
p
19
17 Subset Selection
Given a set of potential first layer weights (centers), which

combination should we use?
An exhaustive search is too expensive.
Forward selection begins with an empty set and adds
centers one at a time.
Backward elimination begins by using all of the potential
centers and then removes them one at a time.
There are other combinations of the forward and backward
methods.
We will concentrate on one forward selection method,
called Orthogonal Least Squares.
20
17 Forward Selection
t Ux e
1 uT z1T
T T
2u z2
U u1 u 2 u n n S1 1

T T
Q u zQ
There will be one row of U for each input/target pair.
If we consider all input vectors as potential centers, there
will be one first-layer neuron for each input vector:
n=Q+1.
In this case, the columns of U represent the potential
centers.
We will start with zero centers selected, and at each step
we will add the center (or column of U) which produces
the largest reduction in squared error.
21
17 Orthogonalize the Columns
U MR
1 r1, 2 r1,3 r1,n

0 1 r2,3 r2,n

R

0 0 0 rn 1,n
0 0 0 1
v1,1 0 0 m1T m1 0 0
0 v 0 0 mT
2 m2 0

MT M V
2, 2

0 0 vn , n 0 0 mTn mn
22
17 Orthogonalized Least Squares
t MRx e Mh e
h Rx
1
h M M MT t V 1MT t
* T
T T
m t m t
hi* i T i
vi ,i mi mi
23
17 Gram-Schmidt Orthogonalization
m1 u1
k 1
mk u k ri ,k mi
i 1
mTi u k
ri ,k T , i 1,, k 1
mi mi
24
17 Incremental Error
The total squared value is:
t T t Mh e Mh e hT MT Mh eT Mh hT MT e eT e
T
eT Mh t Mh Mh t T Mh hT MT Mh
T
h* V 1MT t eT Mh* t T Mh* t T MV 1M T Mh* 0

n
t t h M Mh e e hi2mTi mi eT e
T T T T
i 1
Therefore, basis function i contributes the following to the

squared value:
hi2mTi mi
hi2mTi mi
Normalized error contribution: oi
tT t
25
17 OLS Algorithm
First Step (k = 1): T
m1i t
i
m1 ui , i 1,, Q h1i T
m1i m1i
oi

h m m
1
i 2
1
i T
1
i
o1 o1i1 maxo1i m1 m1i1 ui1
1
tT t
For i 1,, Q, i i1 , i i2 ,, i ik 1
i mTj ui k 1
rj ,k , j 1,, k 1 mk ui rj,ikm j
i
mTj m j j 1
hki
mk t i T
oki
h m
i
k
2 i T
k mki
ok okik maxoki
i T i tT t
mk mk
rj ,k rj,ikk , j 1,, k 1 mk mkik
26
17 Stopping Criteria
k
1 o j
j 1
To convert to original weights:

n
xn hn , xk hk r
j k 1
j ,k xj
27
17 Competitive Learning for First Layer
Cluster the input space using a competitive layer (or

Feature Map).
Use the cluster centers as basis function centers.
The bias can be computed from the variation in each
cluster:
1
1 nc i 2
disti p j i w1
2

nc j 1

1
bi1
2disti
28
17 Backpropagation
S1
n p i w b b
1
i
1 1
i
1
i p
j 1
j w
1 2
i, j
1
ni1 bi1 bi1 wi1, j p j
2 2 p j wi1, j 1
wi , j
1
S1 p i w1
p
j 1
j w 1 2
i, j
ni1
p i w 1
bi
1

F b
1 i
si
1
w i, j p j
1
F
s1
i p i w 1
wi , j
1
p i w1 bi1
29

Ch17 Presn PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ch17 Presn PDF

Загружено:

Авторское право:

Доступные форматы

17

Radial Basis Networks

The first layer weight vectors iw1 are called centers of

w11,1 1, w12,1 1, b11 2, b21 2

0.5 b21 8 0 w21,1 2

Multilayer networks create a distributed representation.

Radial basis network training generally consists of two

This provides a training set for the second layer:

p 2, 1.2, 0.4, 0.4, 1.2, 2

t 0, 0.19, 0.69, 1.3, 1.8, 2

1 0.852 0.527 0.237 0.077 0.018

1 0.852 0.527 0.237 0.077 0.018

Given a set of potential first layer weights (centers), which

1 r1, 2 r1,3 r1,n

h* V 1MT t eT Mh* t T Mh* t T MV 1M T Mh* 0

Therefore, basis function i contributes the following to the

rj ,k rj,ikk , j 1,, k 1 mk mkik

To convert to original weights:

Cluster the input space using a competitive layer (or

Вам также может понравиться