Widrow-HoffLearning-LMS

- neu
- 1.IJMCARDEC20181
- 5.3 AI Agents
- 134 Efficient Parallel Learning Algorithms for Neural Networks
- Fw 3211021106
- Soft Computing
- artificial intelligence ML u4
- v4-3-23
- adaptive noise cancellation_new.rtf
- Matlab Programs
- Modelling of Nutrient Mist Reactor for Hairy Root Growth Using Artificial Neural Network
- Ssp Pt Correlation
- Least Mean Square Algorithm
- ant Analysis - Ppt
- Neural Network Fingerprint ClassiCation
- 01318693
- WCEE2012_2052.pdf
- 1. Fundamentals of Signal Processing
- Signal LabReport
- Excel Gyan(VIP) (1).xls

(LMS Algorithm)

In this chapter we apply the principles of performance

learning to a single-layer linear neural network.

Widrow-Hoff learning is an approximate steepest

descent algorithm, in which the performance index is

mean square error.

the late 1950s, at about the same time that

Frank Rosenblatt developed the

perceptron learning rule.

In

I 1960 Widrow

Wid

and

dH

Hoff

ff iintroduced

t d

d

ADALINE (ADAptive LInear NEuron)

network.

Its learning rule is called LMS (Least Mean

Square) algorithm.

ADALINE is similar to the perceptron,

except that its transfer function is linear,

instead of hard limiting.

2

IUT-Ahmadzadeh

1430/10/28

switching circuits, in 1960 IRE WESCON Convention

Record, Part 4, New York: IRE, pp. 96104.

Widrow, B., and Lehr, M. A., 1990, 30 years of

adaptive neural networks: Perceptron, madaline, and

backpropagation, Proc. IEEE, 78:14151441.

Widrow, B., and Stearns, S. D., 1985, Adaptive Signal

Processing, Englewood Cliffs, NJ: Prentice-Hall.

only solve linearly separable problems.

The LMS algorithm minimizes mean

square error

error, and therefore tries to move

the decision boundaries as far from the

training patterns as possible.

The LMS algorithm found many more

practical uses than the p

p

perceptron

p

((like

most long distance phone lines use

ADALINE network for echo cancellation).

4

IUT-Ahmadzadeh

1430/10/28

ADALINE Network

a = purel in Wp + b = Wp + b

w i 1

iw

iw

w i 2

w i R

Two-Input ADALINE

T

a = 1w p + b = w1 1 p 1 + w1 2 p 2 + b

determined by the input vectors for which the net input n is zero.

IUT-Ahmadzadeh

1430/10/28

The LMS algorithm is an example of supervised training.

Training Set:

{ p 1, t 1} {p 2 , t2} { p Q, t Q}

Input:

Target:

pq

tq

Notation:

x =

1w

a = 1w p + b

z = p

a = x z

2

F x = E e = E t a = E t xT z

The expectation is taken over all sets of input/target pairs.7

Error Analysis

2

F x = E e = E t a = E t xT z

T

F x = E t 2 t x T z + x T z z x

2

F x = E t 2 x T E t z + xT E zz x

This can be written in the following convenient form:

T

F x = c 2 x h + x R x

where

c = E t

h = E tz

R = E zz

IUT-Ahmadzadeh

1430/10/28

between the input vector and its associated

target.

R is the input correlation matrix.

The diagonal elements of this matrix are

equal to the mean square values of the

elements of the input vectors.

The mean square error for the ADALINE Network is a

quadratic function:

T

1 T

F x = c + d x + -- x Ax

2

d = 2 h

A = 2R

Stationary Point

Hessian Matrix:

A = 2R

semidefinite. Really it can be shown that all correlation

matrices are either positive definite or positive

semidefinite. If there are any zero eigenvalues, the

performance index will either have a weak minimum or

else no stationary point (depending on d= -2h),

otherwise there will be a unique global minimum x*

(see Ch8).

T

1 T

Fx = c + d x + - x Ax = d + Ax = 2h + 2Rx

Stationary point:

2h + 2R x = 0

10

IUT-Ahmadzadeh

1430/10/28

definite:

1

x = R h

we could find the minimum point directly from above

equation.

But it is not desirable or convenient to calculate h and

R. So

11

Approximate mean square error (one sample):

x = t k a k 2 = e 2k

F

Expectation of the squared error has been replaced

by the squared error at iteration k.

Approximate (stochastic) gradient:

Fx = e2k

2

e k

e k

e k j = ---------------- = 2 e k ------------ w1 j

w 1 j

j = 1 2 R

2

e k

2

e k

e k R + 1 = ---------------- = 2e k ------------b

b

12

IUT-Ahmadzadeh

1430/10/28

T

e k t k a k

t k 1 w pk + b

------------- = ---------------------------------- =

w1 j

w 1 j

w1 j

e k

------------- =

w 1 j

w 1

t k w 1 i p i k + b

i = 1

Where pi(k) is the ith elements of the input vector at kth iteration.

e k

------------- = p j k

w1 j

e k

-------- = 1

b

F x = e2 k = 2e k z k

13

mean square error by the single error at iteration k as in:

x = tk ak2 = e2k

F

This approximation to F (x) can now be used

in the Steepest descent algorithm.

LMS Algorithm

Al ith

xk + 1 = xk F x

x = xk

14

IUT-Ahmadzadeh

1430/10/28

If we substitute

x k + 1 = x k + 2 e k z k

1w k + 1

= 1w k + 2e k p k

b k + 1 = b k + 2 e k

These last two equations make up the LMS algorithm.

Also called Delta Rule or the Widrow-Hoff learning

algorithm.

15

Multiple-Neuron Case

iw k +

1 = iw k + 2 ei k p k

b i k + 1 = b i k + 2e i k

Matrix Form:

T

W k + 1 = W k + 2e k p k

b k + 1 = b k + 2 e k

16

IUT-Ahmadzadeh

1430/10/28

Analysis of Convergence

Note that xk is a function only of z(k-1), z(k-2), , z(0). If

we assume that successive input vectors are statistically

independent, then xk is independent of z(k).

We will show that for stationary input processes meeting

this condition, so the expected value of the weight vector

will converge to:

*

1

x R h

solution, as we saw before.

17

xk + 1 = xk + 2e k zk

E xk + 1 = E xk + 2E e k z k

Substitute the error with

t (k ) xTk z (k )

T

Ex k + 1 = Ex k + 2E t k z k E xk zk z k

since xTk z (k ) z T (k )x k

T

E xk + 1 = E xk + 2 Etk z k E zkz k xk

18

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = E xk + 2 h RE xk

E xk + 1 = I 2RE xk + 2h

For stability, the eigenvalues of this

matrix must fall inside the unit circle.

eig I 2 R = 1 2 i 1

(where i is an eigenvalue of R)

Since

i 0 ,

19

1 2i 1 .

1 2

1 i

for all i

0 1 m ax

SD we use the Hessian Matrix A, here we use the input

correlation matrix R (Recall that A=2R).

10

20

IUT-Ahmadzadeh

1430/10/28

E xk + 1 = I 2 R E xk + 2 h

If the system is stable,

stable then a steady state condition will be reached.

reached

E xss = I 2 R E xss + 2 h

The solution to this equation is

1

Ex ss = R h = x

Thus the LMS solution, obtained by applying one input at a time, is

the same as the minimum mean square solution of x* R 1h

21

Example

Banana

p

=

t

=

1

1 1

1

p

=

t

=

Apple 2

1 2

1

input correlation matrix is:

1

2

1

2

R = E pp = -- p 1 p 1 + -- p 2 p 2

1

1 0 0

R = --2- 1 1 1 1 + -2- 1 1 1 1 = 0 1 1

1

1 = 1.0

2 = 0.0

3 = 2.0

0 1 1

1

1

------- = ---- = 0.5

max 2.0

. We choose them by trial and error).

11

22

IUT-Ahmadzadeh

1430/10/28

Iteration One

Banana

a0 = W 0p 0 = W0 p1= 0 0 0

1

1= 0

1

W(0) is

selected

arbitrarily.

e 0 = t 0 a0 = t1 a 0= 1 0= 1

W 1 = W0 + 2e 0 pT 0

T

1

W 1 = 0 0 0 + 20.2 1 1 = 0.4 0.4 0.4

1

23

Iteration Two

Apple

1

1 = 0.4

1

e 1 = t1 a1 = t2 a 1= 1 0.4= 1.4

T

1

W 2 = 0.4 0.4 0.4 + 2 0.2 1.4 1 = 0.96 0.16 0.16

1

24

12

IUT-Ahmadzadeh

1430/10/28

Iteration Three

a 2= W2 p 2= W 2 p1= 0.96 0.16 0.16

1

1 = 0.64

1

e 2 = t 2 a 2 = t 1 a2 = 1 0.64= 0.36

T

W = 1 0 0

25

learning process:

Computationally, the learning process

goes through

th

h allll ttraining

i i examples

l ((an

epoch) number of times, until a stopping

criterion is reached.

The convergence process can be

monitored with the plot of the meanmean

squared error function F(W(k)).

26

13

IUT-Ahmadzadeh

1430/10/28

the mean-squared error is sufficiently

small:

ll F(W(k)) <

The rate of change of the mean-squared

error is sufficiently small:

27

Adaptive Filtering

ADALINE is one of the most widely used NNs in practical

applications. One of the major application areas has been

Adaptive Filtering.

Adaptive Filter

Tapped Delay Line

28

14

IUT-Ahmadzadeh

1430/10/28

ak = purelinWp + b =

w1 i yk i + 1 + b

i= 1

lang age wee

recognize this network as a finite impulse response

(FIR) filter.

29

30

15

IUT-Ahmadzadeh

1430/10/28

Two-input filter can attenuate and phase-shift the

noise in the desired way.

31

Correlation Matrix

To Analyze this system we need to find the input

correlation matrix R and the input/target crosscorrelation vector h.

h

R E[zz T ]

z k =

h = E t z

v k

v k 1

t k = s k + m k

2

R=

E v k

E v k v k 1

2

E v k 1v k Ev k 1

h =

16

E s k + m k v k

E s k + m k v k 1

32

IUT-Ahmadzadeh

1430/10/28

and the filtered noise m, to be able to obtain specific

values.

We assume: The EEG signal is a white (Uncorrelated

from one time step to the next) random signal

uniformly distributed between the values -0.2 and +0.2,

the noise source (60 Hz sine wave sampled at 180 Hz) is

given by

2 k

2k

v k = 1.2 sin---------

3

noise attenuated by a factor 1.0 and shifted in phase by

33

-3/4:

m k = 1.2

2 k 3

sin --------- ----- 3

4

2

2k 2

E v k = 1.2 --- sin --------- = 1.2 0.5 = 0.72

3

3

21

k =1

E v k 1 = E v k = 0.72

3

2 k 1

2k

1

E v k v k 1 = --- 1.2 sin ---------1.2 sin-----------------------

3

3

3

k=1

2

2

= 1.2 0.5 cos ------ = 0.36

3

R=

17

0.72 0.36

0.36 0.72

34

IUT-Ahmadzadeh

1430/10/28

Stationary Point

E sk + mk v k = E sk v k + E mk v k

0

1st

independent and zero mean.

1

Em k v k = -3

2k

3

--------- ------ 1.2sin --------- = 0.51

1.2 sin 2k

3

3

4

k =1

E s k + m k v k 1 = Es k v k 1 + E m kv k 1

0

35

1

2k 3

2 k 1

Em k v k 1 = --- 1.2 sin------- ----1.2 sin --------------- = 0.70

3

4

3

k=1

h =

E s k + m k v k

h = 0.51

E s k + m k v k 1

x = R 1 h =

0.72 0.36

0.36 0.72

0.70

0.51

0.70

0.30

0.82

minimum solution?

36

18

IUT-Ahmadzadeh

1430/10/28

Performance Index

T

F x = c 2 x h + x Rx

2

c = E t k = E s k + m k

2

c = Es k + 2E s k mk + E m k

The middle term is zero because s(k) and v(k) are

independent and zero mean.

1

E s k = ------0.4

2

0.2

0.2

0.2

2

1

3

s d s = --------------- s 0.2 = 0.0133

3 0.4

37

1

E m k = --- 1.2 sin 2

------ 3

------ = 0.72

3

3

4

k =1

c = 0.0133

0 0133 + 0.72

0 72 = 0.7333

0 7333

The minimum mean square error is the same as the

mean square value of the EEG signal. This is what

we expected, since the error of this adaptive noise

canceller is in fact the reconstructed EEG Signal.

38

19

IUT-Ahmadzadeh

1430/10/28

W1,2

W1,1

descent.

39

Note that the contours in this figure reflect the fact that

the eigenvalues and the eigenvectors of the Hessian

matrix A=2R are

0.7071

0.7071

, 2 0.75, z 2

0.7071

0.7071

1 2.16, z1

smoother, but the learning proceed more slowly.

Note that max is 2/2.16=0.926 for stability.

40

20

IUT-Ahmadzadeh

1430/10/28

algorithm is approximate steepest descent; it uses an estimate

41

of the gradient, not the true gradient. nnd10eeg

Echo Cancellation

42

21

IUT-Ahmadzadeh

1430/10/28

HW

Ch 4: E 2, 4, 6, 7

Ch 5: 5, 7, 9

Ch 6: 4, 5, 8, 10

Ch 7: 1, 5, 6, 7

Ch 8: 2, 4, 5

Ch 9: 2, 5, 6

Ch 10: 3, 6, 7

43

22

IUT-Ahmadzadeh

