1
Adaptive Signal Processing is concerned with the design, analysis, and
implementation of systems whose structure changes in response to the
incoming data.
Application areas are similar to those of optimal signal processing but now
the environment is changing, the signals are nonstationary and/or the
parameters to be estimated are timevarying. For example,
2
Adaptive Filter Development
3
Adaptive Filter Definition
An adaptive filter is a timevariant filter whose coefficients are adjusted in
a way to optimize a cost function or to satisfy some predetermined
optimization criterion.
Characteristics of adaptive filters:
They can automatically adapt (selfoptimize) in the face of changing
environments and changing system requirements
They can be trained to perform specific filtering and decisionmaking
tasks according to some updating equations (training rules)
Why adaptive?
It can automatically operate in
changing environments (e.g. signal detection in wireless channel)
nonstationary signal/noise conditions (e.g. LPC of a speech signal)
timevarying parameter estimation (e.g. position tracking of a moving
source)
4
Block diagram of a typical adaptive filter is shown below:
Adaptive y(k)
x(k) d(k)
Filter  +
{h(k)}
Adaptive e(k)
Algorithm
x(k) : input signal y(k) : filtered output
d(k) : desired response
h(k) : impulse response of adaptive filter
2 N1 2
The cost function may be E{e (k)} or e (k)
k=0
5
Basic Classes of Adaptive Filtering Applications
6
2.Identification : adaptive control, layered earth modeling, vibration
studies of mechanical system
7
3.Inverse Filtering : adaptive equalization for communication channel,
deconvolution
8
4.Interference Canceling : adaptive noise canceling, echo cancellation
9
Design Considerations
1. Cost Function
choice of cost functions depends on the approach used and the
application of interest
some commonly used cost functions are
N 1
exponentially weighted least squares criterion : minimizes N 1 k e 2 (k )
k =0
where N is the total number of samples and denotes the exponentially
weighting factor whose value is positive close to 1.
10
2. Algorithm
depends on the cost function used
(This is a performance measure for algorithms that use the minimum MSE
criterion)
11
tracking capability : This refers to the ability of the algorithm to track
statistical variations in a nonstationary environment.
computational requirement : number of operations, memory size,
investment required to program the algorithm on a computer.
robustness : This refers to the ability of the algorithm to operate
satisfactorily with illconditioned data, e.g. very noisy environment,
change in signal and/or noise models
3. Structure
structure and algorithm are interrelated, choice of structures is based on
quantization errors, ease of implementation, computational complexity,
etc.
four commonly used structures are direct form, cascade form, parallel
form, and lattice structure. Advantages of lattice structures include
simple test for filter stability, modular structure and low sensitivity to
quantization effects.
12
e.g.,
B2 z 2 C1 z C2 z D1 z + E1 D2 z + E 2
H ( z) = = = +
z + A1 z + A0 ( z p1 ) ( z p 2 )
2 z p1 z p2
13
Commonly Used Methods for Minimizing MSE
For simplicity, it is assumed that the adaptive filter is of causal FIR type
and is implemented in direct form. Therefore, its system block diagram is
x(n) z 1 z 1 ... z 1
+ ... + +
y(n) 
d(n)
e(n) +
+
14
The error signal at time n is given by
e( n ) = d ( n ) y ( n ) (4.2)
where
L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0
W (n) = [ w0 (n) w1 (n) L wL 2 (n) wL 1 (n)]T
X (n) = [ x(n) x(n 1) L x(n L + 2) x(n L + 1)]T
Recall that minimizing the E{e 2 (n)}will give the Wiener solution in optimal
filtering, it is desired that
lim W ( n) = W MMSE = (R xx )1 R dx (4.3)
n
15
Two common gradient searching approaches for obtaining the Wiener
filter are
1. Newton Method
1 E{e (n)}
2
W (n) = R xx
(4.5)
W ( n )
where is called the step size. It is a positive number that controls the
convergence rate and stability of the algorithm. The adaptive algorithm
becomes
E{e 2 (n)}
W (n + 1) = W (n) R xx1
W (n)
= W (n) R xx1 2(R xx W (n) R dx ) (4.6)
= (1 2)W (n) + 2 R xx1 R dx
= (1 2)W (n) + 2W MMSE
16
Solving the equation, we have
The weights jump form any initial W (0) to the optimum setting W MMSE in a
single step.
17
An example of the Newton method with = 0.5 and 2 weights is illustrated
below.
18
2. Steepest Descent Method
E{e 2 (n)}
W (n) =
(4.11)
W ( n )
Thus
E{e 2 (n)}
W (n + 1) = W (n)
W (n)
= W (n) 2(R xx W (n) R dx ) (4.12)
= ( I 2 R xx )W (n) + 2 R xx W MMSE
= ( I 2 R xx )(W (n) W MMSE ) + W MMSE
19
Using the fact that R xx is symmetric and real, it can be shown that
R xx = Q Q 1 = Q Q T (4.15)
1 0 L 0
0
2
=M O M (4.16)
0
0 L 0 L
20
It can be proved that the eigenvalues of R xx are all real and greater or
equal to zero. Using these results and let
V ( n) = Q U ( n) (4.17)
We have
Q U (n + 1) = ( I 2 R xx )Q U (n)
U (n + 1) = Q 1 ( I 2 R xx )Q U (n)
(4.18)
= Q( 1
I Q 2Q 1
)
R xx Q U (n)
= ( I 2 )U (n)
The solution is
U (n) = ( I 2 )n U (0) (4.19)
where U (0) is the initial value of U (n) . Thus the steepest descent
algorithm is stable and convergent if
lim ( I 2 )n = 0
n
21
or
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0 (4.20)
0
0 L 0 lim (1 2 L )
n
n
which implies
1
 1 2 max < 1 0 < < (4.21)
max
where max is the largest eigenvalue of R xx .
If this condition is satisfied, it follows that
lim U (n) = 0 lim Q 1 V (n) = lim Q 1 (W (n) W MMSE ) 0
n n n
(4.22)
lim W (n) = W MMSE
n
22
An illustration of the steepest descent method with two weights and
= 0.3 is given as below.
23
Remarks:
24
Widrows Least Mean Square (LMS) Algorithm
A. Optimization Criterion
B. Adaptation Procedure
25
The LMS algorithm is therefore:
e 2 (n)
W (n + 1) = W (n)
W (n)
e 2 (n) e(n)
= W ( n)
e(n) W (n)
= W (n) 2e(n)
[
d ( n) W T ( n) X ( n)],
AT B
=B
W (n) A
= W (n) + 2e(n) X (n)
or
26
C. Advantages
D. Performance Surface
27
E. Performance Analysis
Two important performance measures in LMS algorithms are rate of
convergence & misadjustment (relates to steady state filter weight
variance).
1. Convergence Analysis
For ease of analysis, it is assumed that W (n) is independent of X (n) .
Taking expectation on both sides of the LMS algorithm, we have
28
Following the previous derivation, W (n) will converge to the Wiener filter
weights in the mean sense if
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0
0
0 L 0 lim (1 2 L )
n
n
 1 2 i < 1, i = 1,2, L , L
1
0 < < (4.26)
max
Define geometric ratio of the p th term as
r p = 1 2 p , p = 1,2, L , L (4.27)
It is observed that each term in the main diagonal forms a geometric
series {1, r p1 , r p2 , L , r pn 1 , r pn , r pn +1 , L}.
29
Exponential function can be fitted to approximate each geometric series:
1 n
r p exp
{ }
r p exp
n
(4.28)
p
p
where p is called the p th time constant .
For slow adaptation, i.e., 2 p << 1, p is approximated as
1 ( 2 p ) ( 2 p )
2 3
= ln(1 2 p ) = 2 p + L 2 p
p 2 3
(4.29)
1
p
2 p
Notice that the smaller the time constant the faster the convergence rate.
Moreover, the overall convergence is limited by the slowest mode of
convergence which in turns stems from the smallest eigenvalue of R xx ,
min .
30
That is,
1
max (4.30)
2 min
the step size : the larger the , the faster the convergence rate
the eigenvalue spread of R xx , ( R xx ) : the smaller ( R xx ) , the faster the
convergence rate. ( R xx ) is defined as
max
( R xx ) = (4.31)
min
31
Example 4.1
An Illustration of eigenvalue spread for LMS algorithm is shown as follows.
Unknown System
x(n) 1
hi z i
i=0
+
q(n)
+
d(n) +
1 y(n) e(n)
wi z i 
i=0
32
; file name is es.m
clear all
N=1000; % number of sample is 1000
np = 0.01; % noise power is 0.01
sp = 1; % signal power is 1 which implies SNR = 20dB
h=[1 2]; % unknown impulse response
x = sqrt(sp).*randn(1,N);
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
33
for n=2:N % the LMS algorithm
y(n) = w0(n)*x(n) + w1(n)*x(n1);
e(n) = d(n)  y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0) % plot filter weight estimate versus time
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e); % plot square error versus time
34
35
36
Note that both filter weights converge at similar speed because the
eigenvalues of the R xx are identical:
Recall
R xx (0) R xx (1)
R xx =
R
xx (1) R xx ( 0)
R xx (0) = E{x(n).x(n)} = 1
R xx (1) = E{x(n).x(n 1)} = 0
As a result,
Rxx (0) Rxx (1) 1 0
R xx = =
R
xx (1) R xx ( 0) 0 1
( R xx ) = 1
37
; file name is es1.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/2).*randn(1,N+1);
x = u(1:N) + u(2:N+1); % x(n) is now a MA process with power 1
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
w0(1) = 0;
w1(1) = 0;
mu = 0.005;
y(1) = w0(1)*x(1);
e(1) = d(1)  y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);
38
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n1);
e(n) = d(n)  y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);
39
40
41
Note that the convergence speed of w0 (n) is slower than that of w1 (n)
Investigating the R xx :
R xx (0) = E{x(n).x(n)}
= E{(u (n) + u (n 1)) (u (n) + u (n 1))}
= E{u 2 (n) + u 2 (n 1)}
= 0.5 + 0.5 = 1
Rxx (1) = E{x(n).x(n 1)}
= E{(u (n) + u (n 1)) (u (n 1) + u (n 2))}
= E{u 2 (n 1)} = 0.5
As a result,
Rxx (0) Rxx (1) 1 0.5
Rxx = =
R xx (1) Rxx (0) 0.5 1
42
; file name is es2.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/5).*randn(1,N+4);
x = u(1:N) + u(2:N+1) + u(3:N+2) + u(4:N+3) +u(5:N+4); % x(n) is 5th order MA process
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
w0(1) = 0;
w1(1) = 0;
mu = 0.005;
y(1) = w0(1)*x(1);
e(1) = d(1)  y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);
43
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n1);
e(n) = d(n)  y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.5])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.5])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);
44
45
46
We see that the convergence speeds of both weights are very slow,
although that of w1 (n) is faster.
Investigating the R xx :
47
2. Misadjustment
Upon convergence, if lim W (n) = W MMSE , then the minimum MSE will be
n
equal to
{ }
min = E d 2 (n) R Tdx W MMSE (4.32)
However, this will not occur in practice due to random noise in the weight
vector W (n) . Notice that we have lim E{W (n)} = W MMSE but not
n
lim W (n) = W MMSE . The MSE of the LMS algorithm is computed as
n
2 ( T 2
E{e (n)} = E d (n) W (n) X (n)
)
{ ( ) }
= min + E (W (n) W MMSE )T X (n) X (n)T (W (n) W MMSE ) (4.33)
= min + E {(W (n) W MMSE )T
R xx (W (n) W MMSE )}
48
The second term of the right hand side at n is known as the excess
MSE and it is given by
{
excess MSE = lim E (W ( n) W MMSE )T R xx (W (n) W MMSE )
n
}
= lim E {V (n) T
R xx V (n) }
n
= lim E {U (n) }
T (4.34)
U ( n)
n
L 1
= min i = min tr [R xx ]
i =0
49
As a result, the misadjustment M is given by
50
2.The bound for is
1
0<< (4.37)
max
In practice, the signal power of x(n) can generally be estimated more
easily than the eigenvalue of R xx . We also note that
L
max i = tr [R xx ] = L E{x 2 (n)} (4.38)
i =1
51
LMS Variants
1. Normalized LMS (NLMS) algorithm
the product vector e(n) X (n) is modified with respect to the squared
Euclidean norm of the tapinput vector X (n) :
2
W (n + 1) = W (n) + T
e( n ) X ( n ) (4.40)
c + X ( n) X ( n)
where c is a small positive constant to avoid division by zero.
can also be considered as an LMS algorithm with a timevarying step
size:
(n ) = T
(4.41)
c + X ( n) X ( n)
substituting c = 0 , it can be shown that the NLMS algorithm converges if
0 < < 0.5 selection of step size in the NLMS is much easier than
that of LMS algorithm
52
2. Sign algorithms
53
3. Leaky LMS algorithm
the LMS update is modified by the presence of a constant leakage factor
:
54
Application Examples
Example 4.2
1. Linear Prediction
Suppose a signal x(n) is a secondorder autoregressive (AR) process that
satisfies the following difference equation:
x( n) = 1.558 x(n 1) 0.81x(n 2) + v(n)
where v(n) is a white noise process such that
v2 , m=0
Rvv (m) = E{v(n)v(n + m)} =
0, otherwise
2
x (n) = wi (n) x(n i ) = w1 (n) x(n 1) + w2 (n) x(n 2)
i =1
55
Upon convergence, we desire
E{w1 (n)} 1.558
and
E{w2 (n)} 0.81
x(n) z 1 z 1
w1(n) w2(n)
+
+
+
x(n) 
d(n)
+ e(n)
+
56
The error function or prediction error e(n) is given by
2
e(n) = d (n) wi (n) x(n i )
i =1
= x(n) w1 (n) x(n 1) w2 (n) x(n 2)
Thus the LMS algorithm for this problem is
e 2 (n) e 2 (n) e(n)
w1 (n + 1) = w1 (n) = w1 (n)
2 w1 (n) 2 e(n) w1 (n)
= w1 (n) + e(n) x(n 1)
and
e 2 (n)
w2 (n + 1) = w2 (n)
2 w2 (n)
= w2 (n) + e(n) x(n 2)
The computational requirement for each sampling interval is
multiplications : 5
addition/subtraction : 4
57
Two values of , 0.02 and 0.004, are investigated:
58
Convergence characteristics for the LMS predictor with = 0.004
59
Observations:
60
Example 4.3
2. System Identification
Given the input signal x(n) and output signal d (n) , we can estimate the
impulse response of the system or plant using the LMS algorithm.
2
Suppose the transfer function of the plant is hi z i which is a causal FIR
i =0
unknown system, then d (n) can be represented as
2
d (n) = hi x(n i )
i =0
61
Assuming that the order the transfer function is unknown and we use a 2
coefficient LMS filter to model the system function as follows,
Plant
x(n) 2 d(n)
hi z i
i=0
+
e(n)

1 y(n)
wi z i
i=0
62
Thus the LMS algorithm for this problem is
The learning behaviours of the filter weights w0 (n) and w1 (n) can be
obtained by taking expectation on the LMS algorithm. To simplify the
analysis, we assume that x(n) is a stationary white noise process such
that
2x , m=0
R xx (m) = E{x(n) x(n + m)} =
0, otherwise
63
Assume the filter weights are independent of x(n) and apply expectation
on the first updating rule gives
64
E{w0 (n + 1)} = E{w0 (n)}(1 2x ) + h0 2x
E{w0 (n)} = E{w0 (n 1)}(1 2x ) + h0 2x
E{w0 (n 1)} = E{w0 (n 2)}(1 2x ) + h0 2x
LLLLLLLLLLLLLLLLLLLL
E{w0 (1)} = E{w0 (0)}(1 2x ) + h0 2x
Multiplying the second equation by (1 2x ) on both sides, the third
equation by (1 2x ) 2 , etc., and summing all the resultant equations, we
have
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x (1 + (1 2x ) + L + (1 2x ) n )
1 (1 2x ) n +1
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x
1 (1 2x )
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 (1 (1 2x ) n +1 )
E{w0 (n + 1)} = ( E{w0 (0)} h0 )(1 2x ) n +1 + h0
65
Hence
lim E{w0 (n)} = h0
n
provided that
2
 1 2x < 1 1 < 1 2x <1 0 < <
2x
Similarly, we can show that the expected value of E{w1 (n)} is
It is worthy to note that the choice of the initial filter weights E{w0 (0)} and
E{w1 (0)} do not affect the convergence of the LMS algorithm because the
performance surface is unimodal.
66
Discussion:
Since the LMS filter consists of two weights but the actual transfer function
comprises three coefficients. The plant cannot be exactly modeled in this
case. This refers to undermodeling. If we use a 3weight LMS filter with
2
transfer function wi z i , then the plant can be modeled exactly. If we use
i =0
more than 3 coefficients in the LMS filter, we still estimate the transfer
function accurately. However, in this case, the misadjustment will be
increased with the filter length used.
Notice that we can also use the Wiener filter to find the impulse response
of the plant if the signal statistics, R xx (0) , R xx (1) , Rdx (0) and Rdx (1) are
available. However, we do not have Rdx (0) and Rdx (1) although
R xx (0) = 2x and R xx (1) =0 are known. Therefore, the LMS adaptive filter can
be considered as an adaptive realization of the Wiener filter and it is used
when the signal statistics are not (completely) known.
67
Example 4.4
3. Interference Cancellation
Given a received signal r (k ) which consist of a source signal s (k ) and a
sinusoidal interference with known frequency. The task is to extract s (k )
from r (k ) . Notice that the amplitude and phase of the sinusoid is unknown.
A wellknown application is to remove 50/60 Hz power line interference in
the recording of the electrocardiogram (ECG).
Source Signal + Sinusoidal Interference
r(k)=s(k) + Acos( 0k + )
Reference Signal +
sin(0k)  e(k)
b0

cos(0k)
90 0 PhaseShift b1
68
The interference cancellation system consists of a 90 0 phaseshifter and a
twoweight adaptive filter. By properly adjusting the weights, the reference
waveform can be changed in magnitude and phase in any way to model
the interfering sinusoid. The filtered output is of the form
e 2 (k ) e 2 (k ) e(k )
b0 (k + 1) = b0 (k ) = b0 (k )
2 b0 (k ) 2 e(k ) b0 (k )
= w0 (k ) + e(k ) sin( 0 k )
and
e 2 (k )
b1 (k + 1) = b1 (k )
2 b1 (k )
= b1 (k ) + e(k ) cos( 0 k )
69
Taking the expected value of b0 (k ) , we have
E{b0 (k + 1)}
= E{b0 (k )} + E{e(k ) sin( 0 k )}
= E{b0 (k )} + E{[ s (k ) + A cos( 0 k + ) b0 (k ) sin( 0 k ) b1 (k ) cos( 0 k )] sin( 0 k )}
= E{b0 (k )} + E{[ A cos( 0 k ) cos() A sin( 0 k ) sin() b0 (k ) sin( 0 k )
b1 (k ) cos( 0 k )] sin( 0 k )}
1
2
{
= E{b0 (k )} + E ( A cos() b1 (k ) ) sin( 2 0 k ) E ( A sin() + b0 (k ) ) sin 2 ( 0 k )
}
1 cos(2 0 k )
= E{b0 (k )} E ( A sin() + b0 (k ) )
2
= E{b0 (k )} A sin() E{b0 (k )}
2 2
A
= 1 E{b0 (k )} sin()
2 2
70
Following the derivation in Example 4.3, provided that 0 < < 4 , the
learning curve of E{b0 (k )} can be obtained as
k
E{b0 (k )} = A sin() + ( E{b0 (0)} + A sin() ) 1
2
k
( )
E{b1 (k )} = A cos() + E{b1 (0)} A cos() 1
2
When k , we have
71
The filtered output is then approximated as
e(k ) r (k ) + A sin() sin( 0 k ) A cos() cos( 0 k )
= s(k )
which means that s (k ) can be recovered accurately upon convergence.
Suppose E{b0 (0)} = E{b1 (0)} = 0 , = 0.02 and we want to find the number
of iterations required for E{b1 (k )} to reach 90% of its steady state value.
Let the required number of iterations be k 0 and it can be calculated from
k0
E{b1 (k )} = 0.9 A cos() = A cos() A cos()1
2
k
0.02 0
1 = 0.1
2
log(0.1)
k0 = = 229.1
log(0.99)
Hence 300 iterations are required.
72
If we use Wiener filter with filter weights b0 and b1, the mean square error
function can be computed as
( )
2
1 A
E{e 2 (k )} = b02 + b12 + A sin() b0 A cos() b1 + E{s 2 (k )} +
2 2
E{e 2 (k )} ~
= b0 + A sin() = 0 b0 = A sin()
b0
and
E{e 2 (k )} ~
= b1 A cos() = 0 b1 = A cos()
b1
73
Example 4.5
74
Taking the Fourier transform of s D (k ) = s (k D) yields
S () = e jD S ()
1 jnD jn
h( n) = e e d
2
1 j ( n D )
= e d
2
= sinc(n D)
where
sin( v)
sinc(v) =
v
75
As a result, s (k D) can be represented as
s ( k D ) = s ( k ) h( k )
= s (k i )sinc(i D )
i =
P
s (k i )sinc(i D )
i = P
for sufficiently large P .
This means that we can use a noncasual FIR filter to model the time
delay and it has the form:
P
W ( z ) = wi z i
i = P
It can be shown that wi sinc(i D ) for i = P, P + 1, L , P using the
minimum mean square error approach. The time delay can be estimated
from {wi } using the following interpolation:
P
D = arg max wi sinc(i t )
t i = P
76
P
r1(k) W(z) = wi z i i
i=P
e(k)
+
r2(k)
P
e(k ) = r2 (k ) r1 (k i ) wi (k )
i = P
77
The LMS algorithm for the time delay estimation problem is thus
e 2 (k )
w j (k + 1) = w j (k )
w j (k )
e 2 (k ) e(k )
= w j (k )
e(k ) w j (k )
= w j (k ) + 2e(k )r1 (k j ), j = P, P + 1, L , P
P
D (k ) = arg max wi (k)sinc(i t )
t i = P
78
Exponentially Weighted Recursive LeastSquares
A. Optimization Criterion
n
To minimize the weighted sum of squares J (n) = n l e 2 (l ) for each time
l =0
n where is a weighting factor such that 0 < 1.
When = 1, the optimization criterion is identical to that of least squaring
filtering and this value of should not be used in a changing environment
because all squared errors (current value and past values) have the same
weighting factor of 1.
To smooth out the effect of the old samples, should be chosen less than
1 for operating in nonstationary conditions.
B. Derivation
Assume FIR filter for simplicity. Following the derivation of the least
squares filter, we differentiate J (n) with respect to the filter weight vector
at time n , i.e., W (n) , and then set the L resultant equations to zero.
79
By so doing, we have
R ( n) W ( n) = G ( n) (4.47)
where
n
R(n) = n l X (l ) X (l )T
l =0
n
G (n) = n l d (l ) X (l )
l =0
n 1
R(n) = n l X (l ) X (l )T + X (n) X (n)T = R(n 1) + X (n) X (n)T (4.48)
l =0
n 1
G (n) = n l d (l ) X (l ) + d (n) X (n) = G (n 1) + d (n) X (n) (4.49)
l =0
80
Using the wellknown matrix inversion lemma:
If
A = B + C CT (4.50)
A 1 = B 1 B 1 C (1 + C T B 1 C ) 1 C T B 1 (4.51)
1 1
1 R ( n 1) X ( n ) X ( n ) T
R ( n 1)
R ( n ) 1 = R (n 1) 1 (4.52)
T
+ X (n) R (n 1) X (n) 1
81
The filter weight W (n) is calculated as
82
As a result, the exponentially weighted recursive least squares (RLS)
algorithm is summarized as follows,
1. Initialize W (0) and R (0) 1
2. For n = 1,2, L, compute
1
( n) = (4.54)
T 1
+ X (n) R (n 1) X ( n)
R ( n ) 1 =
1
[
R (n 1) 1 (n) R (n 1) 1 X (n) X (n)T R (n 1) 1 ] (4.56)
83
Remarks:
1.When = 1, the algorithm reduces to the standard RLS algorithm that
n
minimizes e 2 (l ) .
l =0
2.For nonstationary data, 0.95 < < 0.9995 has been suggested.
3.Simple choices of W (0) and R (0) 1 are 0 and 2 I , respectively, where
2 is a small positive constant.
1. Computational Complexity
RLS is more computationally expensive than the LMS. Assume there are
L filter taps, LMS requires ( 4 L + 1) additions and ( 4 L + 3 ) multiplications
per update while the exponentially weighted RLS needs a total of
( 3L2 + L 1) additions/subtractions and ( 4 L2 + 4 L ) multiplications/divisions.
84
2. Rate of Convergence
RLS provides a faster convergence speed than the LMS because
RLS is an approximation of the Newton method while LMS is an
approximation of the steepest descent method.
the premultiplication of R (n) 1 in the RLS algorithm makes the resultant
eigenvalue spread becomes unity.
Improvement of LMS algorithm with the use of Orthogonal Transform
A. Motivation
When the input signal is white, the eigenvalue spread has a minimum
value of 1. In this case, the LMS algorithm can provide optimum rate of
convergence.
However, many practical signals are nonwhite, how can we improve the
rate of convergence using the LMS algorithm?
85
B. Idea
To transform the input x(n) to another signal v(n) so that the modified
eigenvalue spread is 1. Two steps are involved:
1. Transform x(n) to v(n) using an N N orthogonal transform T so that
12 0 L 0
0 22 M
R vv = T R xx T T = M 0 O
2N 1 0
0 L 0 2N
where
V ( n) = T X ( n)
86
{
R xx = E X (n) X (n)T }
{
R vv = E V (n)V (n)T }
2. Modify the eigenvalues of R vv so that the resultant matrix has identical
eigenvalues:
2 0 L 0
power normalizatoin 0 2 M
R vv R ' vv = M O
2 0
0 L 0 2
87
Block diagram of the transform domain adaptive filter
88
C. Algorithm
The modified LMS algorithm is given by
W (n + 1) = W (n) + 2e( n) 2 V ( n)
where
1 / 12 0 L 0
2
0 1 / 2 M
2 = M O
2
1 / N 1 0
0 L 0 1 / 2N
e( n ) = d ( n ) y ( n )
y ( n) = W ( n) T V ( n) = W ( n) T T X ( n)
89
Writing in scalar form, we have
2e(n)vi (n)
wi (n + 1) = wi (n) + , i = 1,2, L , N
i2
Since i2 is the power of vi (n) and it is not known a priori and should be
estimated. A common estimation procedure for E{vi2 (n)} is
i2 (n) = i2 (n 1) +  vi (n)  2
where
0 < <1
90
Using a 2coefficient adaptive filter as an example:
91
Error surface with discrete cosine transform (DCT)
92
Error surface with transform and power normalization
93
Remarks:
1. The lengths of the principle axes of the hyperellipses are proportional
to the eigenvalues of R .
2. Without power normalization, no convergence rate improvement of
using transform can be achieved.
3. The best choice for T should be KarhunenLoeve (KL) transform which
is signal dependent. This transform can make R vv to a diagonal matrix
but the signal statistics are required for its computation.
4. Considerations in choosing a transform:
fast algorithm exists?
complex or real transform?
elements of the transform are all power of 2?
5. Examples of orthogonal transforms are discrete sine transform (DST),
discrete Fourier transform (DFT), discrete cosine transform (DCT),
WalshHadamard transform (WHT), discrete Hartley transform (DHT)
and powerof2 (PO2) transform.
94
Improvement of LMS algorithm using Newton's method
R xx (l , n) = R xx (l , n 1) + x( n + l ) x(n), l = 0,1, L , L 1
95
Possible Research Directions for Adaptive Signal Processing
1. Adaptive modeling of nonlinear systems
For example, secondorder Volterra system is a simple nonlinear system.
The output y (n) is related to the input x(n) by
L 1 L 1 L 1
(1) ( 2)
y ( n) = w ( j ) x ( n j ) + w ( j1 , j 2 ) x(n j1 ) x(n j 2 )
j =0 j1 = 0 j 2 = 0
96
Some remarks:
When p =1, it becomes leastmeandeviation (LMD), when p =2, it is
leastmeansquare (LMS) and if p =4, it becomes the leastmeanfourth
(LMF).
The LMS is optimum for Gaussian noises and it may not be true for
noises of other probability density functions (PDFs). For example, if the
noise is impulsive such as a stable process with 1 < 2 , LMD
performs better than LMS; if the noise is of uniform distribution or if it is a
sinusoidal signal, then LMF outperforms LMS. Therefore, the optimum p
depends on the signal/noise models.
The parameter p can be any real number but it will be difficult to
analyze, particularly for noninteger p .
Combination of different norms can be used to achieve better
performance.
Some suggests mixed norm criterion, e.g. a E{e 2 (n)} + b E{e 4 (n)}
97
Median operation can be employed in the LMP algorithm for operating in
the presence of impulsive noise. For example, the median LMS belongs
to the family of orderstatisticsleastmeansquare (OSLMS) adaptive
filter algorithms.
3. Adaptive algorithms with fast convergence rate and small
computation
For example, design of optimal step size in LMS algorithms
4. Adaptive IIR filters
Adaptive IIR filters have 2 advantages over adaptive FIR filters:
It generalizes FIR filter and it can model IIR system more accurately
Less filter coefficients are generally required
However, development of adaptive IIR filters are generally more difficult
than the FIR filters because
The performance surface is multimodal the algorithm may lock at an
undesired local minimum
It may lead to biased solution
It can be unstable
98
5. Unsupervised adaptive signal processing (blind signal processing)
What we have discussed previously refers to supervised adaptive signal
processing where there is always a desired signal or reference signal or
training signal.
In some applications, such signals are not available. Two important
application areas of unsupervised adaptive signal processing are:
Blind source separation
e.g. speaker identification in the noisy environment of a cocktail party
e.g. separation of signals overlapped in time and frequency in wireless
communications
Blind deconvolution (= inverse of convolution)
e.g. restoration of a source signal after propagating through an
unknown wireless channel
6. New applications
For example, echo cancellation for handfree telephone systems and
signal estimation in wireless channels using spacetime processing.
99
Questions for Discussion
1. The LMS algorithm is given by (4.23):
where
e( n ) = d ( n ) y ( n )
L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0
Based on the idea of LMS algorithm, derive the adaptive algorithm that
minimizes E{ e(n) }.
v
(Hint: = sgn(v) where sgn(v) = 1 if v > 1 and sgn(v) = 1 otherwise)
v
100
2. For adaptive IIR filtering, there are basically two approaches, namely,
outputerror and equationerror. Let the unknown IIR system be
N 1
j
bjz
B( z ) j =0
H ( z) = =
A( z ) M 1
1 + a i z i
i =1
e( n ) = d ( n ) y ( n )
with
101
N 1
j
b j z
Y ( z ) B ( z ) j =0
= =
X ( z ) A ( z ) M 1
1 + a i z i
i =1
N 1 M 1
y (n) = b j x(n j ) a i y (n i )
j =0 i =1
On the other hand, the equationerror approach is always stable and has
a unimodal surface. Its system block diagram is shown in the next page.
102
n(k)
+
Unknown system d(k) +
s(k)
H(z)
r(k)
+
B(z) A(z)
e(k)
103