Академический Документы
Профессиональный Документы
Культура Документы
We have shown that adaptation equation (SD p, R) can be written in an equivalent form as (see also the
Figure with the implementation of SD algorithm)
w(n + 1) = w(n) + [Ee(n)u(n)] (Equation SD u, e)
In order to simplify the algorithm, instead the true gradient of the criterion
w(n) J(n) = 2Eu(n)e(n)
LMS algorithm will use an immediately available approximation
w(n) J(n) = 2u(n)e(n)
Lecture 4 2
Using the noisy gradient, the adaptation will carry on the equation
1
w(n + 1) = w(n) w(n) J(n) = w(n) + u(n)e(n)
2
In order to gain new information at each time instant about the gradient estimate, the procedure will go
through all data set {(d(1), u(1)), (d(2), u(2)), . . .}, many times if needed.
LMS algorithm
the (correlated) input signal samples {u(1), u(2), u(3), . . .},
generated randomly;
Given
the desired signal samples {d(1), d(2), d(3), . . .} correlated
with {u(1), u(2), u(3), . . .}
1 Initialize the algorithm with an arbitrary parameter vector w(0), for example w(0) = 0.
2 Iterate for n = 0, 1, 2, 3, . . . , nmax
2.0 Read /generate a new data pair, (u(n), d(n))
M 1
2.1 (Filter output) y(n) = w(n)T u(n) = i=0 wi (n)u(n i)
2.2 (Output error) e(n) = d(n) y(n)
2.3 (Parameter adaptation) w(n + 1) = w(n) + u(n)e(n)
w0 (n + 1) w0 (n) u(n)
w1 (n + 1) w1 (n) u(n 1)
. . .
or componentwise = + e(n)
. . .
. . .
wM 1 (n + 1) wM 1 (n) u(n M + 1)
2
The iterations are deterministic : starting from a given w(0), all the iterations w(n) are perfectly determined.
LMS iterations are not deterministic: the values w(n) depend on the realization of the data d(1), . . . , d(n)
and u(1), . . . , u(n). Thus, w(n) is now a random variable.
Ew(n) wo
Convergence of the criterion J(w(n)) (in the mean square of the error)
J(w(n)) J(w )
1. The input vectors u(1), u(2), . . . , u(n) are statistically independent vectors (very strong requirement:
even white noise sequences dont obey this property );
Lecture 4 5
and now using the statistical independence of u(n) and w(n), which implies the statistical independence of
u(n) and (n),
Using the principle of orthogonality which states that E[u(n)eo (n)] = 0, the last equation becomes
which was used in the analysis of SD algorithm stability, and identifying now c(n) with E(n), we have the
following result:
The convergence property explains the behavior of the rst order characterization of (n) = w(n) wo .
We denote (n) = QH (n) and N (n) = QH N (n) the rotated vectors (remember, Q is the matrix formed by
the eigenvectors of matrix R) and we thus obtain
(n + 1) = (I ) (n) N (n)
or written componentwise
Taking the modulus and then taking the expectation in both members:
E|j (n + 1)|2 = (1 j )2 E|j (n)|2 2(1 j )E[Nj (n)j (n)] + 2 E[|Nj (n)|]2
Making the assumption: E[Nj (n)j (n)] = 0 and denoting
j (n) = E[|j (n)|]2
we obtain the recursion showing how j (n) propagates through time.
j (n + 1) = (1 j )2 j (n) + 2 E[|Nj (n)|]2
More information can be obtained if we assume that the algorithm is in the steady-state, and therefore J(n)
is close to 0. Then
e(n)u(n) = N (n)
i.e. the adaptation vector used in LMS is only noise. Then
E[N (n)N (n)T ] = E[e2 (n)u(n)u(n)T ] E[e2 (n)]E[u(n)u(n)T ] = Jo R = Jo QQH
and therefore
E[N (n)N (n)H ] = Jo
or componentwise
E[|Nj (n)|]2 = Jo j
and nally
j (n + 1) = (1 j )2 j (n) + 2 Jo j
Lecture 4 9
Using the assumption |1 j | < 1(which is also required for convergence in mean) at the limit
1 Jo
lim (n) = 2 Jo j
n j
=
1 (1 j )2 2 j
This relation gives an estimate of the variance of the elements of QH (w(n) wo ) vector. Since this variance
converges to a nonzero value, it results that the parameter vector w(n) continues to uctuate around the
optimal vector wo . In Lecture 2 we obtained the canonical form of the quadratic form which expresses the
mean square error:
M
J(n) = Jo + i |i |2
i=1
and dening the criterion J as the value of criterion J(w(n)) = J(n) when n we obtain
M
Jo
J = Jo + i
i=1 2 i
For i 2
M
M
J = Jo + Jo i /2 = Jo (1 + i /2) = Jo (1 + tr(R)/2) = (3)
i=1 i=1
M
J = Jo (1 + M r(0)/2) = Jo (1 + Power of the input) (4)
2
The steady state mean square error J is close to optimal mean square error if is small enough.
In [Haykin 1991] there was a more complex analysis, involving the transient analysis of J(n). It showed that
convergence in mean square sense can be obtained if
M
i
<1
i=1 2(1 i )
Assumption 1 The step size parameter is small, so the LMS acts as a low pass lter with a low cuto
frequency.
It allows to approximate the equation
by an approximation:
Assumption 2 The physical mecanism for generating the desired response d(n) has the same form as
the adaptive lter d(n) = wTo u(n) + eo (n) where eo (n) is a white noise, statistically independent of u(n).
Assumption 3 The input vector u(n) and the desired response d(n) are jointly Gaussian.
Lecture 4 12
Learning curves
The statistical performance of adaptive lters is studied using learning curves, averaged over many realizations,
or ensemble-averaged.
The mean square error MSE learning curve Take an ensemble average of the squared estimation
error e(n)2
The mean-square deviation (MSD) learning curve Take an ensemble average of the squared error
deviation ||(n)||2
2
0<< (12)
max
The excess mean-square-error converges to
M
Jmin
Jex () = k (13)
2 k=1
The misadjustment
Jex () M
M= = k = tr(R) = M r(0) (14)
Jmin 2 k=1 2 2
Lecture 4 14
where v2 = 0.001
Selecting the filter structure
The lter has M = 11 delays units (taps).
The weights (parameter) of the lter are symmetric with respect to the middle tap (n = 5).
The channel input is delayed 7 units to provide the desired response to the equalizer.
Correlation matrix of the Equalizer input
3
Since u(n) = k=1 h(k)a(n k) + v(n) is a MA process, the correlation function will be
r(0) = h(1)2 + h(2)2 + h(3)2 + v2
r(1) = h(1)h(2) + h(2)h(3)
r(2) = h(1)h(3)
r(3) = r(4) = . . . = 0
Lecture 4 16
0.8 1
0.5
0.6
0
0.4
0.5
0.2 1
0 1.5
0 2 4 6 8 10 0 20 40 60
1
0.5
0.5
0 0
0.5
0.5
1
1 1.5
0 20 40 60 0 20 40 60
Sample signals in adaptive equalizer experiment
Lecture 4 17
1
10
0
10
1
10
2
10
3
10
0 50 100 150 200 250 300 350 400 450 500
time step n
Lecture 4 20
0
10
1
10
2
10
3
10
0 500 1000 1500 2000 2500 3000
time step n
Lecture 4 22
% Adaptive equalization
% CHANNEL MODEL
W=2.9;
h= [ 0, 0.5 * (1+cos(2*pi/W*(-1:1))) ];
ah=conv(h,a);
v= sqrt(0.001)*randn(500,1); % Gaussian noise with variance 0.001;
u=ah(1:500)+v;
subplot(221) , stem(impz(h,1,10)), title(Channel input response h(n))
subplot(222) , stem(ah(1:59)), title(Convolved (distorted) signal h*a)
subplot(223) , stem(a(1:59)), title(Original signal to be transmitted)
subplot(224) , stem(u(1:59)), title(Received signal (noise + distortion))
average_J(i)=average_J(i)+J(i);
end
end
average_J=average_J/N;
semilogy(average_J), title(Learning curve Ee^2(n) for LMS algorithm),
xlabel(time step n)
Lecture 4 25
Summary
2
0<<
max
For LMS lters with lter length M moderate to large, the convergence condition on the step
size is
2
0<<
M Smax
where Smax is the maximum value of the power spectral density of the tap inputs.