Вы находитесь на странице: 1из 29

17

Radial Basis Networks

1
17 Radial Basis Network
Inputs Radial Basis Layer Linear Layer

W1
1
S xR
p1 a1 a2
1
R x1
||dist|| 1
S x1
W2 S2x1
n n
1 2
S2xS1
.* S1x1 2
S x1
1 b
1
1 b
2

1 1 2
R S1x1 S S2x1 S

a i = radbas(||iw -p||b i) a = W a +b
1 1 1 2 2 1 2

n pi w b
1
i
1 1
i b 1/ 2 a f n e n2

The first layer weight vectors iw1 are called centers of


the basis functions.
2
17 Gaussian Transfer Function (Local)

1.0

0.8
a f n e n2

0.6

a
0.4

0.2

3 2 1 0 1 2 3

n
3
17 Example Network Function

w11,1 1, w12,1 1, b11 2, b21 2


w12,1 1, w12,2 1, b 2 0
2

a2
0

1
2 1 0 1 2

p
4
17 Parameter Variations

2 2

0.5 b21 8 0 w21,1 2


1 1

0 0

1 1
2 1 0 1 2 2 1 0 1 2

2 2
1 b2 1
1 w12,1 1
1 1

0 0

1 1
2 1 0 1 2 2 1 0 1 2

5
17 Pattern Recognition Problem
p2 p4

p1 p3

1 1 1 1
Category 1 : p 2 , p3 Category 2 : p1 , p 4
1 1 1 1

6
17 Radial Basis Solution
Choose centers at p2 and p3: Choose bias to be 1:

p T
2 1 1 1
W T
1
b
1

p
3 1 1 1
This will cause the following reduction
in the basis functions where they meet:

1 ae n 2
e 1 2 2 e 2 0.1353
0.5

0.5
Choose the second layer bias to
1 produce negative outputs, unless we
1.5
are near p2 and p3. Choose second
2
3
2
layer weights so that output moves
above 0 near p2 and p3.
3
1 2
0 1
1 0
1
2

W 2 2 2, b 2 1
2
3 3

7
17 Final Decision Regions

3
3 2 1 0 1 2 3

8
17 Global Versus Local

Multilayer networks create a distributed representation.


All sigmoid or linear transfer functions overlap in their activity.
Radial basis networks create local representations.
Each basis function is only active over a small region.
The global approach requires fewer neurons. The local
approach is susceptible to the curse of dimensionality.
The local approach leads to faster training and is suitable
for adaptive methods.

9
17 Radial Basis Training

Radial basis network training generally consists of two


stages.
During the first stage, the weights and biases in the first
layer are set. This can involve unsupervised training or
even random selection of the weights.
The weights and biases in the second layer are found
during the second stage. This usually involves linear least
squares, or LMS for adaptive training.
Backpropagation (gradient-based) algorithms can also be
used for radial basis networks.

10
17 Assume Fixed First Layer
We begin with the case where the first layer weights (centers) are
fixed. Assume they are set on a grid, or randomly set. For random
weights, the bias can be S1
bi1
d max
The training data is given by
p1, t1, p2 , t 2 , , pQ , tQ
With first layer weights and biases fixed, the first layer output can
be computed:
ni1,q pq i w1 bi1 a1q radbasn1q

This provides a training set for the second layer:


a , t , a , t , , a
1
1 1
1
2 2
1
Q , tQ
11
17 Linear Least Squares (2nd Layer)
Q

a W a b
2 2 1 2 F x t q a q t q a2q
2 T

q 1

1 w2 a1q
x 2 zq
b 1

aq2 1 w2 a1q b2 xT z q
T

Q
F x t q x z q t q xT z q
T T

q 1

12
17 Matrix Form
t1 1 uT z1T e1
t T T e
2u z2
t e
2
2
U

T T
tQ Q u zQ eQ

F x t Ux t Ux
T
e t Ux

n
F x t Ux t Ux
T
xi2 t Ux T t Ux x T x
i 1

t T t 2t T Ux x T UT Ux x T x

t T t 2t T Ux x T UT U I x
13
17 Linear Least Squares Solution


F x t T t 2t T Ux x T UT U I x
1 T
c dT x x Ax (Quadratic Function)
2

1 T
F x c d x x Ax d Ax
T

2

2UT t 2 UT U I x 0

U U Ix
T *
UT t

14
17 Example (1)

g p 1 sin p for 2 p 2
4

p 2, 1.2, 0.4, 0.4, 1.2, 2

t 0, 0.19, 0.69, 1.3, 1.8, 2

2 0.5
W1 0 , b1 0.5

2 0.5

15
17 Example (2)
ni1,q pq i w1 bi1 a1q radbasn1q

1 0.852 0.527 0.237 0.077 0.018



a 0.368 , 0.698 , 0.961 , 0.961 , 0.698 , 0.368
1

0.018 0.077 0.237 0.527 0.852 1

1 0.852 0.527 0.237 0.077 0.018


0.368 0.698 0.961 0.961 0.698 0.368
UT
0.018 0.077 0.237 0.527 0.852 1

1 1 1 1 1 1
t T 0 0.19 0.69 1.3 1.8 2

16
17 Example (3)

1
x * U T U I U T t

1
2.07 1.76 0.42 2.71 1.01 1.03
1.76 3.09 1.76 4.05 4.05 0
x*
0.42 1.76 2.07 2.71 4.41 1.03

2.71 4.05 2.71 6 6 1

W 2 1.03 0 1.03 b 2 1

17
17 Example (4)
2.5

1.5

a2 0.5

0.5

1.5
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

a1 0.5

0
1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1

p
18
17 Bias Too Large

1.5

a2 8
b1 8
0.5


0
8
0.5

1
2 1.5 1 0.5 0 0.5 1 1.5 2

a1 0.5

0
2 1.5 1 0.5 0 0.5 1 1.5 2

p
19
17 Subset Selection

Given a set of potential first layer weights (centers), which


combination should we use?
An exhaustive search is too expensive.
Forward selection begins with an empty set and adds
centers one at a time.
Backward elimination begins by using all of the potential
centers and then removes them one at a time.
There are other combinations of the forward and backward
methods.
We will concentrate on one forward selection method,
called Orthogonal Least Squares.

20
17 Forward Selection
t Ux e
1 uT z1T
T T
2u z2
U u1 u 2 u n n S1 1

T T
Q u zQ
There will be one row of U for each input/target pair.
If we consider all input vectors as potential centers, there
will be one first-layer neuron for each input vector:
n=Q+1.
In this case, the columns of U represent the potential
centers.
We will start with zero centers selected, and at each step
we will add the center (or column of U) which produces
the largest reduction in squared error.
21
17 Orthogonalize the Columns
U MR

1 r1, 2 r1,3 r1,n


0 1 r2,3 r2,n

R

0 0 0 rn 1,n
0 0 0 1

v1,1 0 0 m1T m1 0 0
0 v 0 0 mT
2 m2 0

MT M V
2, 2


0 0 vn , n 0 0 mTn mn

22
17 Orthogonalized Least Squares
t MRx e Mh e

h Rx

1
h M M MT t V 1MT t
* T

T T
m t m t
hi* i T i
vi ,i mi mi

23
17 Gram-Schmidt Orthogonalization

m1 u1

k 1
mk u k ri ,k mi
i 1

mTi u k
ri ,k T , i 1,, k 1
mi mi

24
17 Incremental Error
The total squared value is:
t T t Mh e Mh e hT MT Mh eT Mh hT MT e eT e
T

eT Mh t Mh Mh t T Mh hT MT Mh
T

h* V 1MT t eT Mh* t T Mh* t T MV 1M T Mh* 0


n
t t h M Mh e e hi2mTi mi eT e
T T T T

i 1

Therefore, basis function i contributes the following to the


squared value:
hi2mTi mi

hi2mTi mi
Normalized error contribution: oi
tT t
25
17 OLS Algorithm
First Step (k = 1): T
m1i t
i
m1 ui , i 1,, Q h1i T
m1i m1i

oi

h m m
1
i 2
1
i T
1
i
o1 o1i1 maxo1i m1 m1i1 ui1
1
tT t
For i 1,, Q, i i1 , i i2 ,, i ik 1
i mTj ui k 1
rj ,k , j 1,, k 1 mk ui rj,ikm j
i
mTj m j j 1

hki
mk t i T
oki
h m
i
k
2 i T
k mki
ok okik maxoki
i T i tT t
mk mk

rj ,k rj,ikk , j 1,, k 1 mk mkik

26
17 Stopping Criteria

k
1 o j
j 1

To convert to original weights:


n
xn hn , xk hk r
j k 1
j ,k xj

27
17 Competitive Learning for First Layer

Cluster the input space using a competitive layer (or


Feature Map).
Use the cluster centers as basis function centers.
The bias can be computed from the variation in each
cluster:
1
1 nc i 2
disti p j i w1
2

nc j 1

1
bi1
2disti

28
17 Backpropagation

S1
n p i w b b
1
i
1 1
i
1
i p
j 1
j w
1 2
i, j

1
ni1 bi1 bi1 wi1, j p j
2 2 p j wi1, j 1
wi , j
1
S1 p i w1
p
j 1
j w 1 2
i, j
ni1
p i w 1

bi
1


F b
1 i
si
1
w i, j p j
1
F
s1
i p i w 1
wi , j
1
p i w1 bi1

29

Вам также может понравиться