Вы находитесь на странице: 1из 10

n = -5:0.

1:5;
1
a = hardlims(n);
0.8 plot(n,a)
n = -5:0.1:5;
0.6
a = hardlim(n);
1
plot(n,a) 0.4

0.2
0.5

0
-5 0 5 0

-0.5

• The hard-limiting threshold function -1


-5 0 5

– Corresponds to the biological paradigm


• either fires or not

 1, net  0
f (net )  sgn( net )   bipolar binary
  1, net  0

 1, net  0
f (net )  sgn( net )   unipolar binary
0, net  0
Activation functions of a neuron
Step function Sign function Sigmoid function Linear function

Y Y Y Y
+1 +1 +1 +1

0 X 0 X 0 X 0 X
-1 -1 -1 -1

1, if X  0  1, if X  0 1
Y step   Y sign   Y sigmoid  Y linear  X
0, if X  0  1, if X  0 1  e X

2
f ( net )   1 bipolar continuous
1  e ( net )
 1, net  0
f ( net )  sgn( net )   bipolar binary
  1, net  0
1
f ( net )  unipolar continuous
1  e ( net )
 1, net  0
f ( net )  sgn( net )   unipolar binary
0 , net  0
1 a=logsig(n) = 1 / (1 + exp(-n))
0.9

0.8

0.7 n = -5:0.1:5;
0.6
a = logsig(n);
0.5

0.4 plot(n,a)
0.3

0.2

0.1

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

0.9

0.8

a = tansig(n) = 2/(1+exp(-2*n))-1 0.7

0.6

0.5

n = -5:0.1:5; 0.4

a = tansig(n); 0.3

0.2
plot(n,a) 0.1

0
-5 -4 -3 -2 -1 0 1 2 3 4 5
ERROR SURFACE
X= [2 3; 12 7; -3 5];
y = [5 19 2];
w1 = 0:0.1:2;
w2 = 0:0.1:2;
for p1 = 1:length(w1)
for p2 = 1:length(w2)
err(p1,p2) = 0;
for n = 1:3
% compute network output for example n
ynet = w1(p1)*X(n,1) + w2(p2)*X(n,2);
% update total error
err(p1,p2) = err(p1,p2) + (y(n) - ynet)^2;
end;
end;
end;
% plot error function
surf(w1,w2,err);
Learning by Error Minimization
We like to minimize the squared error (which is a function
of the weights), for each training pair/pattern:

Square makes error positive and penalizes large errors


1/2 just makes some of the maths easier
The total error will be the sum of errors across all
patterns
Need to change the weights in order to minimize the
error
– Use principle of gradient descent - Calculate derivative
(gradient) of the error with respect to the weights, and
then change the weights by a small increment in the
opposite direction to the gradient
The gradient Descent Optimization

Direction of steepest
descent
1 1
E  (d i  oi )  (d i  f ( wi x)) 2
2 t

2 2

E  (d i  oi ) f ' ( wi x) x
t

The components of the gradient vector are


E
 (d i  oi ) f ' ( wi x) x j j  1,2,...., n
t
for
wij
w i   E
w i   [ d i  oi ] f ' ( net i ) x
w    r  x
r  [d i  f ( wi x)] f ' ( wi x)
t t
w    r  x

w2  w1   [d i  oi ] f ' (neti ) x d=t


Example 1.(same as the binary one)

X1=[1 –2 0 –1]1, X2=[0 1.5 -.5 -1]1, X3=[-1 1 .5 -1]1


d1 = -1 d2 = -1 d3 = 1
w1=[1 -1 0 .5]; 2
Net1=net1=[1 -1 0 .5]*[1 -2 0 -1]'=2.5 f ( net )  (  net )
1
1 e
2e (  net )
f ' ( net ) 
w1 [1  e (  net ) ]2
w2
2e(  net ) 1
o (  net ) 2
 (1  o 2
)
w3 [1  e ] 2
w4 net 1  2.5 O1  0.848
f ' ( net 1 )  0.140
Complete this problem
w 2   [ d i  oi ] f ' ( net 1 ) x1  w1
for one epoch

 [0.974  .948 0 0.526 ]'

net 2  1.948

Вам также может понравиться