Вы находитесь на странице: 1из 107

Adaptive Filtering - Theory and Applications

Jose C. M. Bermudez
Department of Electrical Engineering
Federal University of Santa Catarina
Florian opolis SC
Brazil
IRIT - INP-ENSEEIHT, Toulouse
May 2011
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 1 / 107
1
Introduction
2
Adaptive Filtering Applications
3
Adaptive Filtering Principles
4
Iterative Solutions for the Optimum Filtering Problem
5
Stochastic Gradient Algorithms
6
Deterministic Algorithms
7
Analysis
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 2 / 107
Introduction
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 3 / 107
Estimation Techniques
Several techniques to solve estimation problems.
Classical Estimation
Maximum Likelihood (ML), Least Squares (LS), Moments, etc.
Bayesian Estimation
Minimum MSE (MMSE), Maximum A Posteriori (MAP), etc.
Linear Estimation
Frequently used in practice when there is a limitation in computational
complexity Real-time operation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 4 / 107
Linear Estimators
Simpler to determine: depend on the rst two moments of data
Statistical Approach Optimal Linear Filters

Minimum Mean Square Error

Require second order statistics of signals


Deterministic Approach Least Squares Estimators

Minimum Least Squares Error

Require handling of a data observation matrix


Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 5 / 107
Limitations of Optimal Filters and LS Estimators
Statistics of signals may not be available or cannot be accurately
estimated
There may not be available time for statistical estimation (real-time)
Signals and systems may be non-stationary
Memory required may be prohibitive
Computational load may be prohibitive
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 6 / 107
Iterative Solutions
Search the optimal solution starting from an initial guess
Iterative algorithms are based on classical optimization algorithms
Require reduced computational eort per iteration
Need several iterations to converge to the optimal solution
These methods form the basis for the development of adaptive
algorithms
Still require the knowledge of signal statistics
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 7 / 107
Adaptive Filters
Usually approximate iterative algorithms and:
Do not require previous knowledge of the signal statistics
Have a small computational complexity per iteration
Converge to a neighborhood of the optimal solution
Adaptive lters are good for:
Real-time applications, when there is no time for statistical estimation
Applications with nonstationary signals and/or systems
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 8 / 107
Properties of Adaptive Filters
They can operate satisfactorily in unknown and possibly time-varying
environments without user intervention
They improve their performance during operation by learning
statistical characteristics from current signal observations
They can track variations in the signal operating environment (SOE)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 9 / 107
Adaptive Filtering Applications
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 10 / 107
Basic Classes of Adaptive Filtering Applications
System Identication
Inverse System Modeling
Signal Prediction
Interference Cancelation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 11 / 107
System Identication
+ +
_
+
x(n)
d(n)

d(n)
y(n)
e(n)
e
o
(n)
other signals
unknown
system
adaptive
adaptive
algorithm
lter
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 12 / 107
Applications System Identication
Channel Estimation
Communications systems
Objective: model the channel to design distortion compensation
x(n): training sequence
Plant Identication
Control systems
Objective: model the plant to design a compensator
x(n): training sequence
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 13 / 107
Echo Cancellation
Telephone systems and VoIP
Echo caused by network impedance mismatches or acoustic
environment
Objective: model the echo path impulse response
x(n): transmitted signal
d(n): echo + noise
H H
Tx
Rx
H H
Tx
Rx
EC
+
_
x(n)
d(n) e(n)
Figure: Network Echo Cancellation
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 14 / 107
Inverse System Modeling
Adaptive lter attempts to estimate unknown systems inverse
Adaptive lter input usually corrupted by noise
Desired response d(n) may not be available
+
+
+
_
Unknown
System
Adaptive
Adaptive
Filter
Algorithm
Delay
other
signals
x(n)
s(n)
e(n)
y(n)
d(n)
z(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 15 / 107
Applications Inverse System Modeling
Channel Equalization
+
+
+
_
Channel
Adaptive
Adaptive
Filter
Algorithm
Local gen.
x(n)
x(n)
s(n)
e(n)
y(n)
d(n)
z(n)
Objective: reduce intersymbol interference
Initially training sequence in d(n)
After training: d(n) generated from previous decisions
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 16 / 107
Signal Prediction
+
_
Delay
Adaptive
Adaptive
Filter
Algorithm
other
signals
x(n n
o
)
x(n) e(n)
y(n)
d(n)
most widely used case forward prediction
signal x(n) to be predicted from samples
{x(n n
o
), x(n n
o
1), . . . , x(n n
o
L)}
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 17 / 107
Application Signal Prediction
DPCM Speech Quantizer - Linear Predictive Coding
Objective: Reduce speech transmission bandwidth
Signal transmitted all the time: quantization error
Predictor coecients are transmitted at low rate
+
_
+
+
Speech
signal
signal DPCM
Quantizer
Predictor
prediction error
Q[e(n)]
e(n)
y(n) + Q[e(n)] d(n)
d(n)
y(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 18 / 107
Interference Cancelation
One or more sensor signals are used to remove interference and noise
Reference signals correlated with the inteference should also be
available
Applications:

array processing for radar and communications

biomedical sensing systems

active noise control systems


Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 19 / 107
Application Interference Cancelation
Active Noise Control
Ref: D.G. Manolakis, V.K. Ingle and S.M. Kogon, Statistical and Adaptive Signal Processing, 2000.
Cancelation of acoustic noise using destructive interference
Secondary system between the adaptive lter and the cancelation
point is unavoidable
Cancelation is performed in the acoustic environment
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 20 / 107
Active Noise Control Block Diagram
s
+
+

Adaptive
Algorithm
w
o
w(n)
x(n)
x
f
(n)

S
S
y(n) y
s
(n) y
g
(n)
d(n)
z(n)
e(n)
g(y
s
)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 21 / 107
Adaptive Filtering Principles
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 22 / 107
Adaptive Filter Features
Adaptive lters are composed of three basic modules:
Filtering strucure

Determines the output of the lter given its input samples

Its weights are periodically adjusted by the adaptive algorithm

Can be linear or nonlinear, depending on the application

Linear lters can be FIR or IIR


Performance criterion

Dened according to application and mathematical tractability

Is used to derive the adaptive algorithm

Its value at each iteration aects the adaptive weight updates


Adaptive algorithm

Uses the performance criterion value and the current signals

Modies the adaptive weights to improve performance

Its form and complexity are function of the structure and of the
performance criterion
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 23 / 107
Signal Operating Environment (SOE)
Comprises all informations regarding the properties of the signals and
systems
Input signals
Desired signal
Unknown systems
If the SOE is nonstationary
Aquisition or convergence mode: from start until close to best
performance
Tracking mode: readjustment following SOEs time variations
Adaptation can be
Supervised desired signal is available e(n) can be evaluated
Unsupervised desired signal is unavailable
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 24 / 107
Performance Evaluation
Convergence rate
Misadjustment
Tracking
Robustness (disturbances and numerical)
Computational requirements (operations and memory)
Structure

facility of implementation

performance surface

stability
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 25 / 107
Optimum versus Adaptive Filters in Linear Estimation
Conditions for this study
Stationary SOE
Filter structure is transversal FIR
All signals are real valued
Performance criterion: Mean-square error E[e
2
(n)]
The Linear Estimation Problem
+

Linear FIR Filter


w
x(n)
y(n)
d(n)
e(n)
J
ms
= E[e
2
(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 26 / 107
The Linear Estimation Problem
+

Linear FIR Filter


w
x(n)
y(n)
d(n)
e(n)
x(n) = [x(n), x(n 1), , x(n N + 1)]
T
y(n) = x
T
(n)w
e(n) = d(n) y(n) = d(n) x
T
(n)w
J
ms
= E[e
2
(n)] =
2
d
2p
T
w +w
T
R
xx
w
where
p = E[x(n)d(n)]; R
xx
= E[x(n)x
T
(n)]
Normal Equations
R
xx
w
o
= p w
o
= R
1
xx
p for R
xx
> 0
J
ms
min
=
2
d
p
T
R
1
xx
p
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 27 / 107
What if d(n) is nonstationary?
+

Linear FIR Filter


w
x(n)
y(n)
d(n)
e(n)
x(n) = [x(n), x(n 1), , x(n N + 1)]
T
y(n) = x
T
(n)w(n)
e(n) = d(n) y(n) = d(n) x
T
(n)w(n)
J
ms
(n) = E[e
2
(n)] =
2
d
(n) 2p(n)
T
w(n) +w
T
(n)R
xx
w(n)
where
p(n) = E[x(n)d(n)]; R
xx
= E[x(n)x
T
(n)]
Normal Equations
R
xx
w
o
(n) = p(n) w
o
(n) = R
1
xx
p(n) for R
xx
> 0
J
ms
min
(n) =
2
d
(n) p
T
(n)R
1
xx
p(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 28 / 107
Optimum Filters versus Adaptive Filters
Optimum Filters
Compute
p(n) = E[x(n)d(n)]
Solve R
xx
w
o
= p(n)
Filter with w
o
(n)
y(n) = x
T
(n)w
o
(n)
Nonstationary SOE:
Optimum lter determined
for each value of n
Adaptive Filters
Filtering: y(n) = x
T
(n)w(n)
Evaluate error: e(n) = d(n) y(n)
Adaptive algorithm:
w(n + 1) = w(n) + w[x(n), e(n)]
w(n) is chosen so that w(n) is close to
w
o
(n) for n large
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 29 / 107
Characteristics of Adaptive Filters
Search for the optimum solution on the performance surface
Follow principles of optimization techniques
Implement a recursive optimization solution
Convergence speed may depend on initialization
Have stability regions
Steady-state solution uctuates about the optimum
Can track time varying SOEs better than optimum lters
Performance depends on the performance surface
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 30 / 107
Iterative Solutions for the
Optimum Filtering Problem
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 31 / 107
Performance (Cost) Functions
Mean-square error E[e
2
(n)] (Most popular)
Adaptive algorithms: Least-Mean Square (LMS), Normalized LMS
(NLMS), Ane Projection (AP), Recursive Least Squares (RLS), etc.
Regularized MSE
J
rms
= E[e
2
(n)] + w(n)
2
Adaptive algorithm: leaky least-mean square (leaky LMS)

1
norm criterion
J

1
= E[|e(n)|]
Adaptive algorithm: Sign-Error
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 32 / 107
Performance (Cost) Functions continued
Least-mean fourth (LMF) criterion
J
LMF
= E[e
4
(n)]
Adaptive algorithm: Least-Mean Fourth (LMF)
Least-mean-mixed-norm (LMMN) criterion
J
LMMN
= E[e
2
(n) +
1
2
(1 )e
4
(n)]
Adaptive algorithm: Least-Mean-Mixed-Norm (LMMN)
Constant-modulus criterion
J
CM
= E[
_
|x
T
(n)w(n)|
2
_
2
]
Adaptive algorithm: Constant-Modulus (CM)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 33 / 107
MSE Performance Surface Small Input Correlation
20
10
0
10
20
20
0
20
0
500
1000
1500
2000
2500
3000
3500
w
1
w
2
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 34 / 107
MSE Performance Surface Large Input Correlation
20
15
10
5
0
5
10
15
20
20
10
0
10
20
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
w
1
w
2
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 35 / 107
The Steepest Descent Algorithm Stationary SOE
Cost Function
J
ms
(n) = E[e
2
(n)] =
2
d
2p
T
w(n) +w
T
(n)R
xx
w(n)
Weight Update Equation
w(n + 1) = w(n) + c(n)
: step-size
c(n): correction term (determines direction of w(n))
Steepest descent adjustment:
c(n) = J
ms
(n) J
ms
(n + 1) J
ms
(n)
w(n + 1) = w(n) + [p R
xx
w(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 36 / 107
Weight Update Equation About the Optimum Weights
Weight Error Update Equation
w(n + 1) = w(n) + [p R
xx
w(n)]
Using p = R
xx
w
o
w(n + 1) = (I R
xx
)w(n) + R
xx
w
o
Weight error vector: v(n) = w(n) w
o
v(n + 1) = (I R
xx
)v(n)
Matrix I R
xx
must be stable for convergence (|
i
| < 1)
Assuming convergence, lim
n
v(n) = 0
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 37 / 107
Convergence Conditions
v(n + 1) = (I R
xx
)v(n); R
xx
positive denite
Eigen-decomposition of R
xx
R
xx
= QQ
T
v(n + 1) = (I QQ
T
)v(n)
Q
T
v(n + 1) = Q
T
v(n) Q
T
v(n)
Dening v(n + 1) = Q
T
v(n + 1)
v(n + 1) = (I ) v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 38 / 107
Convergence Properties
v(n + 1) = (I ) v(n)
v
k
(n + 1) = (1
k
) v
k
(n), k = 1, . . . , N
v
k
(n) = (1
k
)
n
v
k
(0)
Convergence modes

monotonic if 0 < 1
k
< 1

oscillatory if 1 < 1
k
< 0
Convergence if |1
k
| < 1 0 < <
2

max
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 39 / 107
Optimal Step-Size
v
k
(n) = (1
k
)
n
v
k
(0); convergence modes: 1
k
max |1
k
|: slowest mode
min|1
k
|: fastest mode
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 40 / 107
Optimal Step-Size continued
1
0
|
1

k
|
1/
max 1/
min

min

max{|1
k
|}
1
o

min
= (1
o

max
)

o
=
2

max
+
min
Optimal slowest modes:
1
+1
; =

max

min
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 41 / 107
The Learning Curve J
ms
(n)
J
ms
(n) = J
ms
min
+
Excess MSE
..
v
T
(n) v(n)= J
ms
min
+
N

k=1

k
v
2
k
(n)
Since v
k
(n) = (1
k
)
n
v
k
(0),
J
ms
(n) = J
ms
min
+
N

k=1

k
(1
k
)
2n
v
2
k
(0)

k
(1
k
)
2
> 0 monotonic convergence
Stability limit is again 0 < <
2

max
J
ms
(n) converges faster than w(n)
Algorithm converges faster as =
max
/
min
1
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 42 / 107
Simulation Results
x(n) = x(n 1) + v(n)
0 20 40 60 80 100 120 140 160 180 200
70
60
50
40
30
20
10
0
iteration
M
S
E

(
d
B
)
Steepest Descent Mean Square Error
Input: White noise ( = 1)
0 20 40 60 80 100 120 140 160 180 200
70
60
50
40
30
20
10
0
iteration
M
S
E

(
d
B
)
Steepest Descent Mean Square Error
Input: AR(1), = 0.7 ( = 5.7)
Linear system identication - FIR with 20 coecients
Step-size = 0.3
Noise power
2
v
= 10
6
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 43 / 107
The Newton Algorithm
Steepest descent: linear approx. of J
ms
about the operating point
Newtons method: Quadratic approximation of J
ms
Expanding J
ms
(w) in Taylors series about w(n),
J
ms
(w) J
ms
[w(n)] +
T
J
ms
[w w(n)]
+
1
2
[ww(n)]
T
H(n)[ww(n)]
Diferentiating w.r.t. w and equating to zero at w = w(n + 1),
J
ms
[w(n + 1)] = J
ms
[w(n)] +H[w(n)][w(n + 1) w(n)] = 0
w(n + 1) = w(n) H
1
[w(n)]J
ms
[w(n)]
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 44 / 107
The Newton Algorithm continued
J
ms
[w(n)] = 2p + 2R
xx
w(n)
H(w(n)) = 2R
xx
Thus, adding a step-size control,
w(n + 1) = w(n) R
1
xx
[p +R
xx
w(n)]
Quadratic surface conv. in one iteration for = 1
Requires the determination of R
1
xx

Can be used to derive simpler adaptive algorithms
When H(n) is close to singular regularization

H(n) = 2R
xx
+ 2I
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 45 / 107
Basic Adaptive Algorithms
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 46 / 107
Least Mean Squares (LMS) Algorithm
Can be interpreted in dierent ways
Each interpretation helps understanding the algorithm behavior
Some of these interpretations are related to the steepest descent
algorithm
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 47 / 107
LMS as a Stochastic Gradient Algorithm
Suppose we use the estimate J
ms
(n) = E[e
2
(n)] e
2
(n)
The estimated gradient vector becomes

J
ms
(n) =
e
2
(n)
w(n)
= 2e(n)
e(n)
w(n)
Since e(n) = d(n) x
T
(n)w(n),

J
ms
(n) = 2e(n)x(n) (stochastic gradient)
and, using the steepest descent weight update equation,
w(n + 1) = w(n) + e(n)x(n) (LMS weight update)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 48 / 107
LMS as a Stochastic Estimation Algorithm
J
ms
(n) = 2p + 2R
xx
w(n)
Stochastic estimators
p = d(n)x(n)

R
xx
= x(n)x
T
(n)
Then,

J
ms
(n) = 2d(n)x(n) + 2x(n)x
T
(n)w(n)
Using

J
ms
(n) is the steepest descent weight update,
w(n + 1) = w(n) + e(n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 49 / 107
LMS A Solution to a Local Optimization
Error expressions
e(n) = d(n) x
T
(n)w(n) (a priori error)
(n) = d(n) x
T
(n)w(n + 1) (a posteriori error)
We want to maximize |(n) e(n)| with |(n)| < |e(n)|
(n) e(n) = x
T
(n)w(n)
w(n) = w(n + 1) w(n)
Expressing w(n) as w(n) =

w(n)e(n)
(n) e(n) = x
T
(n)

w(n)e(n)
For max |(n) e(n)|

w(n) in the direction of x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 50 / 107


w(n) = x(n) and
w(n) = x(n)e(n)
and
w(n + 1) = w(n) + e(n)x(n)
As
(n) e(n) = x
T
(n)w(n) = x
T
(n)x(n)e(n)
|(n)| < |e(n)| requires |1 x
T
(n)x(n)| < 1, or
0 < <
2
x(n)
2
(stability region)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 51 / 107
Observations - LMS Algorithm
LMS is a noisy approximation of the steepest descent algorithm
The gradient estimate is unbiased
The errors in the gradient estimate lead to J
ms
ex
() = 0
Vector w(n) is now random
Steepest descent properties are no longer guaranteed
LMS analysis required
The instantaneous estimates allow tracking without redesign
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 52 / 107
Some Research Results
J. C. M. Bermudez and N. J. Bershad, A nonlinear analytical model for the
quantized LMS algorithm - the arbitrary step size case, IEEE Transactions on
Signal Processing, vol.44, No. 5, pp. 1175-1183, May 1996.
J. C. M. Bermudez and N. J. Bershad, Transient and tracking performance
analysis of the quantized LMS algorithm for time-varying system identication,
IEEE Transactions on Signal Processing, vol.44, No. 8, pp. 1990-1997, August
1996.
N. J. Bershad and J. C. M. Bermudez, A nonlinear analytical model for the
quantized LMS algorithm - the power-of-two step size case, IEEE Transactions on
Signal Processing, vol.44, No. 11, pp. 2895- 2900, November 1996.
N. J. Bershad and J. C. M. Bermudez, Sinusoidal interference rejection analysis
of an LMS adaptive feedforward controller with a noisy periodic reference, IEEE
Transactions on Signal Processing, vol.46, No. 5, pp. 1298-1313, May 1998.
J. C. M. Bermudez and N. J. Bershad, Non-Wiener behavior of the Filtered-X
LMS algorithm, IEEE Trans. on Circuits and Systems II - Analog and Digital
Signal Processing, vol.46, No. 8, pp. 1110-1114, Aug 1999.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 53 / 107
Some Research Results continued
O. J. Tobias, J. C. M. Bermudez and N. J. Bershad, Mean weight behavior of the
Filtered-X LMS algorithm, IEEE Transactions on Signal Processing, vol. 48, No.
4, pp. 1061-1075, April 2000.
M. H. Costa, J. C. M. Bermudez and N. J. Bershad, Stochastic analysis of the
LMS algorithm with a saturation nonlinearity following the adaptive lter output,
IEEE Transactions on Signal Processing, vol. 49, No. 7, pp. 1370-1387, July 2001.
M. H. Costa, J. C, M. Bermudez and N. J. Bershad, Stochastic analysis of the
Filtered-X LMS algorithm in systems with nonlinear secondary paths, IEEE
Transactions on Signal Processing, vol. 50, No. 6, pp. 1327-1342, June 2002.
M. H. Costa, J. C. M. Bermudez and N. J. Bershad, The performance surface in
ltered nonlinear mean square estimation, IEEE Transactions on Circuits and
Systems I, vol. 50, No. 3, p. 445-447, March 2003.
G. Barrault, J. C. M. Bermudez and A. Lenzi, New Analytical Model for the
Filtered-x Least Mean Squares Algorithm Veried Through Active Noise Control
Experiment, Mechanical Systems and Signal Processing, v. 21, p. 1839-1852,
2007.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 54 / 107
Some Research Results continued
N. J. Bershad, J. C. M. Bermudez and J. Y. Tourneret, Stochastic Analysis of the
LMS Algorithm for System Identication with Subspace Inputs, IEEE Trans. on
Signal Process., v. 56, p. 1018-1027, 2008.
J. C. M. Bermudez, N. J. Bershad and J. Y. Tourneret, An Ane Combination of
Two LMS Adaptive Filters Transient Mean-Square Analysis, IEEE Trans. on
Signal Process., v. 56, p. 1853-1864, 2008.
M. H. Costa, L. R. Ximenes and J. C. M. Bermudez, Statistical Analysis of the
LMS Adaptive Algorithm Subjected to a Symmetric Dead-Zone Nonlinearity at the
Adaptive Filter Output, Signal Processing, v. 88, p. 1485-1495, 2008.
M. H. Costa and J. C. M. Bermudez, A Noise Resilient Variable Step-Size LMS
Algorithm, Signal Processing, v. 88, no. 3, p. 733-748, March 2008.
P. Honeine, C. Richard, J. C. M. Bermudez, J. Chen and H. Snoussi, A
Decentralized Approach for Nonlinear Prediction of Time Series Data in Sensor
Networks, EURASIP Journal on Wireless Communications and Networking, v.
2010, p. 1-13, 2010. (KNLMS)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 55 / 107
The Normalized LMS Algorithm NLMS
Most Employed adaptive algorithm in real-time applications
Like LMS has dierent interpretations (even more)
Alleviates a drawback of the LMS algorithm
w(n + 1) = w(n) + e(n)x(n)

If amplitude of x(n) is large Gradient noise amplication

Sub-optimal performance when


2
x
varies with time (for instance,
speech)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 56 / 107
NLMS A Solution to a Local Optimization Problem
Error expressions
e(n) = d(n) x
T
(n)w(n) (a priori error)
(n) = d(n) x
T
(n)w(n + 1) (a posteriori error)
We want to maximize |(n) e(n)| with |(n)| < |e(n)|
(n) e(n) = x
T
(n)w(n) (A)
w(n) = w(n + 1) w(n)
For |(n)| < |e(n)| we impose the restriction
(n) = (1 )e(n), |1 | < 1
(n) e(n) = e(n) (B)
For max |(n) e(n)| w(n) in the direction of x(n)
w(n) = kx(n) (C)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 57 / 107
Using (A), (B) and (C),
k =
e(n)
x
T
(n)x(n)
and
w(n + 1) = w(n) +
e(n)x(n)
x
T
(n)x(n)
(NLMS weight update)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 58 / 107
NLMS Solution to a Constrained
Optimization Problem
Error sequence
y(n) =x
T
(n)w(n) (estimate of d(n))
e(n) =d(n) y(n) = d(n) x
T
(n)w(n) (estimation error)
Optimization (principle of minimal disturbance)
Minimize w(n)
2
= w(n + 1) w(n)
2
subject to: x
T
(n)w(n + 1) = d(n)
This problem can be solved using the method of Lagrange multipliers
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 59 / 107
Using Lagrange multipliers, we minimize
f[w(n + 1)] = w(n + 1) w(n)
2
+ [d(n) x
T
(n)w(n + 1)]
Dierentiating w.r.t. w(n + 1) and equating the result to zero,
w(n + 1) = w(n) +
1
2
x(n) (*)
Using this result in x
T
(n)w(n + 1) = d(n) yields
=
2e(n)
x
T
(n)x(n)
Using this result in (*) yields
w(n + 1) = w(n) +
e(n)x(n)
x
T
(n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 60 / 107
NLMS as an Orthogonalization Process
Conditions for the analysis

e(n) = d(n) x
T
(n)w(n)

w
o
is the optimal solution (Wiener solution)

d(n) = x
T
(n)w
o
(no noise)

v(n) = w(n) w
o
(weight error vector)
Error signal
e(n) = d(n) x
T
(n)[v(n) +w
o
]
= x
T
(n)w
o
x
T
(n)[v(n) +w
o
]
= x
T
(n)v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 61 / 107
e(n) = x
T
(n)v(n)
Interpretation: To minimize the error v(n) should be orthogonal to all
input vectors
Restriction: We have only one vector x(n)
Iterative solution:
We can subtract from v(n) its component in the direction of x(n) at
each iteration
If there are N adaptive coecients, v(n) could be reduced to zero
after N orthogonal input vectors
Iterative projection extraction
Gram-Schmidt orthogonalization
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 62 / 107
Recursive orthogonalization
n = 0 : v(1) = v(0) proj. of v(0) onto x(0)
n = 1 : v(2) = v(1) proj. of v(1) onto x(1)
.
.
. :
.
.
.
n + 1 : v(n + 1) = v(n) proj. of v(n) onto x(n)
Projection of v(n) onto x(n)
P
x(n)
[v(n)] =
_
x(n)[x
T
(n)x(n)]
1
x
T
(n)
_
v(n)
Weight update equation
v(n + 1) = v(n)
scalar
..
[x
T
(n)x(n)]
1
e(n)
..
x
T
(n)v(n) x(n)
= v(n) +
e(n)x(n)
x
T
(n)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 63 / 107
NLMS has its owns problems!
NLMS solves the LMS gradient error amplication problem, but ...
What happens if x(n)
2
gets too small?
One needs to add some regularization
w(n + 1) = w(n) +
e(n)x(n)
x
T
(n)x(n)+
(NLMS)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 64 / 107
-NLMS Stochastic Approximation of
Regularized Newton
Regularized Newton Algorithm
w(n + 1) = w(n) + [I +R
xx
]
1
[p R
xx
w(n)]
Instantaneous estimates
p = x(n)d(n)

R
xx
= x(n)x
T
(n)
Using these estimates
w(n + 1) = w(n) + [I +x(n)x
T
(n)]
1
x(n)
e(n)
..
[d(n) x
T
(n)w(n)]
w(n + 1) = w(n) + [I +x(n)x
T
(n)]
1
x(n)e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 65 / 107
w(n + 1) = w(n) + [I +x(n)x
T
(n)]
1
x(n)e(n)
Inversion of I +x(n)x
T
(n) ?
Matrix Inversion Formula
[A+BCD]
1
= A
1
A
1
B[C
1
+DA
1
B]
1
DA
1
Thus,
[I +x(n)x
T
(n)]
1
=
1
I

2
1 +
1
x
T
(n)x(n)
x(n)x
T
(n)
Post-multiplying both sides by x(n) and rearranging
[I +x(n)x
T
(n)]
1
x(n) =
x(n)
+x
T
(n)x(n)
and
w(n + 1) = w(n) +
e(n)x(n)
x
T
(n)x(n) +
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 66 / 107
Some Research Results
M. H. Costa and J. C. M. Bermudez, An improved model for the normalized LMS
algorithm with Gaussian inputs and large number of coecients, in Proc. of the
2002 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP-2002), Orlando, Florida, pp. II- 1385-1388, May 13-17, 2002.
J. C. M. Bermudez and M. H. Costa, A Statistical Analysis of the -NLMS and
NLMS Algorithms for Correlated Gaussian Signals, Journal of the Brazilian
Telecommunications Society, v. 20, n. 2, p. 7-13, 2005.
G. Barrault, M. H. Costa, J. C. M. Bermudez and A. Lenzi, A new analytical
model for the NLMS algorithm, in Proc. 2005 IEEE International Conference on
Acoustics Speech and Signal Processing (ICASSP 2005) Pennsylvania, USA, vol.
IV, p. 41-44, 2005.
J. C. M. Bermudez, N. J. Bershad and J.-Y. Tourneret, An Ane Combination of
Two NLMS Adaptive Filters - Transient Mean-Square Analysis, In Proc.
Forty-Second Asilomar Conference on Asilomar Conference on Signals, Systems &
Computers, Pacic Grove, CA, USA, 2008.
P. Honeine, C. Richard, J. C. M. Bermudez and H. Snoussi, Distributed
prediction of time series data with kernels and adaptive ltering techniques in
sensor networks, In Proc. Forty-Second Asilomar Conference on Asilomar
Conference on Signals, Systems & Computers, Pacic Grove, CA, USA, 2008.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 67 / 107
The Ane Projection Algorithm
Consider the NLMS algorithm but using
{x(n), x(n 1), . . . , x(n P)}
Minimize w(n)
2
= w(n + 1) w(n)
2
subject to:
_

_
x
T
(n)w(n + 1) = d(n)
x
T
(n 1)w(n + 1) = d(n 1)
.
.
.
x
T
(n P)w(n + 1) = d(n P)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 68 / 107
Observation matrix
X(n) = [x(n), x(n 1), , x(n P)]
Desired signal vector
d(n) = [d(n), d(n 1), . . . , d(n P)]
Error vector
e(n) = [e(n), e(n 1), . . . , e(n P)] = d(n) X
T
(n)w(n)
Vector of the constraint errors
e
c
(n) = d(n) X
T
(n)w(n + 1)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 69 / 107
Using Lagrange multipliers, we minimize
f[w(n + 1)] = w(n + 1) w(n)
2
+
T
[d(n) X
T
(n)w(n + 1)]
Dierentiating w.r.t. w(n + 1) and equating the result to zero,
w(n + 1) = w(n) +
1
2
X(n) (*)
Using this result in X
T
(n)w(n + 1) = d(n) yields
= 2[X
T
(n)X(n)]
1
e(n)
Using this result in (*) yields
w(n + 1) = w(n) + X(n)[X
T
(n)X(n)]
1
e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 70 / 107
Ane Projection
Solution to the Underdetermined Least-Squares Problem
We want minimize
e
c
(n)
2
= d(n) X
T
(n)w(n + 1)
2
where X
T
(n) is (P + 1) N with (P + 1) < N
Thus, we look for the least-squares solution of the underdetermined
system
X
T
(n)w(n + 1) = d(n)
The solution is
w(n + 1) =
_
X
T
(n)

+
d(n) = X(n)[X
T
(n)X(n)]
1
d(n)
Using d(n) = e(n) +X
T
(n)w(n) yields
w(n + 1) = w(n) + X(n)[X
T
(n)X(n)]
1
e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 71 / 107
Observations:
Order of the AP algorithm: P + 1 (P = 0 NLMS)
Convergence speed increases with P (but not linearly)
Computational complexity increases with P (not linearly)
If X
T
(n)X(n) is close to singular, use X
T
(n)X(n) + I
The scalar error of NLMS becomes a vector error in AP (except for
= 1)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 72 / 107
Ane Projection Stochastic Approximation of
Regularized Newton
Regularized Newton Algorithm
w(n + 1) = w(n) + [ I +R
xx
]
1
[p R
xx
w(n)]
Estimates using time window statistics
p =
1
P + 1
n

k=nP
X(k)d(k) =
1
P + 1
X(n)d(n)

R
xx
=
1
P + 1
n

k=nP
X(k)X
T
(k) =
1
P + 1
X(n)X
T
(n)
Using these estimates with = /(P + 1),
w(n + 1) = w(n) + [I +X(n)X
T
(n)]
1
X(n)
e(n)
..
[d(n) X
T
(n)w(n)]
w(n + 1) = w(n) + [I +X(n)X
T
(n)]
1
X(n)e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 73 / 107
w(n + 1) = w(n) + [I +X(n)X
T
(n)]
1
X(n)e(n)
Using the Matrix Inversion Formula
[I +X(n)X
T
(n)]
1
= X(n)[I +X
T
(n)X(n)]
1
and
w(n + 1) = w(n) + X(n)[I +X
T
(n)X(n)]
1
e(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 74 / 107
Ane Projection Algorithm as a
Projection onto an Ane Subspace
Conditions for the analysis

e(n) = d(n) X
T
(n)w(n)

d(n) = X
T
(n)w
o
denes the optimal solution in the least-squares
sense

v(n) = w(n) w
o
(weight error vector)
Error vector
e(n) = d(n) X
T
(n)[v(n) +w(n + 1)]
= X
T
(n)w(n + 1) X
T
(n)[v(n) +w(n + 1)]
= X
T
(n)v(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 75 / 107
e(n) = X
T
(n)v(n)
Interpretation: To minimize the error v(n) should be orthogonal to all
input vectors
Restriction: We are going to use only {x(n), x(n 1), . . . , x(n P)}
Iterative solution:
We can subtract from v(n) its projection onto the range of X(n) at
each iteration
v(n + 1) = v(n) P
X(n)
v(n) (Proj. onto an ane subspace)
Using X
T
(n)v(n) = e(n)
v(n + 1) = v(n)
_
X(n)[X
T
(n)X(n)]
1
X
T
(n)
_
v(n)
= v(n) + X(n)[X
T
(n)X(n)]
1
e(n) (AP algorithm)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 76 / 107
Pseudo Ane Projection Algorithm
Major problem with the AP algorithm
For = 1, e(n) e(n) large computational complexity
The Pseudo-AP algorithm replaces the input x(n) with its P-th order
autoregressive prediction
x(n) =
P

k=1
a
k
x(n k)
Using the least-squares solution for a = [a
1
, a
2
, . . . , a
P
]
T
,
a = [X
T
p
(n)X
p
(n)]
1
X
T
p
(n)x(n), X
p
(n) = [x(n 1), . . . , x(n P)]
Now, we subtract from x(n) its projections onto the last P input
vectors
(n) = x(n) X
p
(n)a(n)
= x(n)
_
X
p
(n)[X
T
p
(n)X
p
(n)]
1
X
T
p
(n)
_
x(n)
= (I P
P
)x(n); (P
P
: projection matrix)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 77 / 107
Example Pseudo-AP versus AP
0 1000 2000 3000 4000 5000 6000
80
70
60
50
40
30
20
10
0
Mean Square Error
d
B
Iterations
AR(8)=[0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2]
N=64
P=8
SNR=80dB
S=Null
AP(72.82dB)
PAP(75.82)
=0.7
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 78 / 107
Using as the new input vector for the NLMS algorithm,
w(n + 1) = w(n) +
(n)

T
(n)(n)
e(n)
It can be shown that Pseudo-AP is identical to AP for an AR input
and = 1
Otherwise, Pseudo-AP is dierent from AP
Simpler to implement than AP for = 1
Can lead even to better steady-state results than AP for AR inputs
For AR inputs: NLMS with input orthogonalization
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 79 / 107
Some Research Results
S. J. M. de Almeida, J. C. M. Bermudez, N. J. Bershad and M. H. Costa, A
statistical analysis of the ane projection algorithm for unity step size and
autoregressive inputs, IEEE Transactions on Circuits and Systems - I, vol. 52, pp.
1394-1405, July 2005.
S. J. M. Almeida, J. C. M. Bermudez and N. J. Bershad, A Stochastic Model for
a Pseudo Ane Projection Algorithm, IEEE Transactions on Signal Processing, v.
57, p. 107-118, 2009.
S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, A Stochastic Model for
the Decient Length Pseudo Ane Projection Adaptive Algorithm, In Proc. 17th
European Signal Processing Conference (EUSIPCO), Aug. 24-28, Glasgow,
Scotland, 2009.
S. J. M. Almeida, M. H. Costa and J. C. M. Bermudez, A stochastic model for
the decient order ane projection algorithm, in Proc. ISSPA 2010, Kuala
Lumpur, Malaysia, May 2010.
C. Richard, J. C. M. Bermudez and P. Honeine, Online Prediction of Time Series
Data With Kernels, IEEE Transactions on Signal Processing, v. 57, p.
1058-1067, 2009.
P. Honeine, C. Richard, J. C. M. Bermudez, H. Snoussi, M. Essoloh and F.
Vincent, Functional estimation in Hilbert space for distributed learning in wireless
sensor networks, In Proc. IEEE International Conference on Acoustics Speech and
Signal Processing (ICASSP), Taipei, Taiwan, p. 2861-2864, 2009.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 80 / 107
Deterministic Algorithms
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 81 / 107
Recursive Least Squares Algorithm (RLS)
Based on a deterministic philosophy
Designed for the present realization of the input signals
Least squares method adapted for real time processing of temporal
series
Convergence speed is not strongly dependent on the input statistics
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 82 / 107
RLS Problem Denition
Dene
x
o
(n) = [x(n), x(n 1), . . . , x(0)]
T
x(n) = [x(n), x(n 1), . . . , x(n N + 1)]
T
input to the N-tap lter
Desired signal d(n)
Estimator of d(n): y(n) = x
T
(n)w(n)
Cost function Squared Error
J
ls
(n) =
n

k=0
e
2
(k) =
n

k=0
[d(k) x
T
(k)w(n)]
2
= e
T
(n)e(n), e(n) = [e(n), e(n 1), . . . , e(0)]
T
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 83 / 107
In vector form
d(n) = [d(n), d(n 1), . . . , d(0)]
T
y(n) = [y(n), y(n 1), . . . , y(0)]
T
, y(k) = x
T
(k)w(n)
y(n) =
N1

k=0
w
k
(n)x
o
(n k) = (n)w(n)
(n) =
_

_
x(n) x(n 1) x(n N + 1)
x(n 1) x(n 2) x(n N)
.
.
.
.
.
.
.
.
.
.
.
.
x(0) x(1) x(N + 1)
_

_
= [x
o
(n), x
o
(n 1), . . . x
o
(n N + 1)]
e(n) = d(n) (n)w(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 84 / 107
e(n) = d(n) (n)w(n)
and we minimize
J
ls
(n) = e(n)
2
Normal Equations

T
(n)(n)w(n) =
T
(n)d(n)

R(n)w(n) = p(n)
Alternative representation for

R and p

R(n) =
T
(n)(n) =
n

k=0
x(k)x
T
(k)
p(n) =
T
(n)d(n) =
n

k=0
x(k)d(k)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 85 / 107
RLS Observations
w(n) remains xed for 0 k n to determine J
ls
(n)
Optimum vector w(n) is w
o
(n) =

R
1
(n) p
The algorithm has growing memory
In an adaptive implementation,
w
o
(n) =

R
1
(n) p
must be solved for each iteration n
Problems for an adaptive implementation

Diculties with nonstationary signals (innite data window)

Computational complexity
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 86 / 107
RLS - Forgetting the Past
Modied cost function
J
ls
(n) =
n

k=0

nk
e
2
(k) =
n

k=0

nk
[d(k) x
T
(k)w(n)]
2
= e
T
(n)e(n), = diag[1, ,
2
, . . . ,
n
]
Modied Normal Equations

T
(n)(n)w(n) =
T
(n)d(n)

R(n)w(n) = p(n)
where

R(n) =
T
(n)(n) =
n

k=0

nk
x(k)x
T
(k)
p(n) =
T
(n)d(n) =
n

k=0

nk
x(k)d(k)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 87 / 107
RLS Recursive Updating
Correlation Matrix

R(n) =
n

k=0

nk
x(k)x
T
(k)
=
n1

k=0

nk
x(k)x
T
(k) +x(n)x
T
(n)
=

R(n 1) +x(n)x
T
(n)
Cross-correlation vector
p(n) =
n

k=0

nk
x(k)d(k)
=
n1

k=0

nk
x(k)d(k) +x(n)d(n)
= p(n 1) +x(n)d(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 88 / 107
We have recursive updating expressions for

R(n) and p(n)
However, we need a recursive updating for

R
1
(n), as
w
o
(n) =

R
1
(n) p(n)
Applying the matrix inversion lemma to

R(n) =

R(n 1) +x(n)x
T
(n)
and dening P(n) =

R
1
(n) yields
P(n) =
1
P(n 1)

2
P(n 1)x(n)x
T
(n)P(n 1)
1 +
1
x
T
(n)P(n 1)x(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 89 / 107
The Gain Vector
Denition
k(n) =

1
P(n 1)x(n)
1 +
1
x
T
(n)P(n 1)x(n)
Using this denition
P(n) =
1
P(n 1)
1
k(n)x
T
(n)P(n 1)
k(n) can be written as
k(n) =
P(n)
..
_

1
P(n 1)
1
k(n)x
T
(n)P(n 1)

x(n)
=

R
1
(n)x(n)
k(n) is the repres. of x(n) on the column space of

R(n)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 90 / 107
RLS Recursive Weight Update
w(n + 1) = w
o
(n)
Using
w(n + 1) =

R
1
(n) p(n) = P(n) p(n)
p(n) = p(n 1) +x(n)d(n)
P(n) =
1
P(n 1)
1
k(n)x
T
(n)P(n 1)
k(n) = P(n)x(n)
e(n) = d(n) x
T
(n)w(n)
yields
w(n + 1) = w(n) +k(n)e(n) RLS weight update
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 91 / 107
RLS Observations
The RLS convergence speed is not aected by the eigenvalues of

R(n)
Initialization: p(0) = 0,

R(0) = I, (1 )
2
x
This results in

R(n) changed to
n
I +

R(n)
Biased estimate of w
o
(n) for small n
(No problem for < 1 and large n)
Numerical problems in nite precision

Unstable for = 1

Loss of symmetry in P(n)

Evaluate only upper (lower) part and diagonal of P(n)

Replace P(n) with [P(n) +P


T
(n)]/2 after updating from P(n 1)
Numerical problems when x(n) 0 and < 1
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 92 / 107
Some Research Results
C. Ludovico and J. C. M. Bermudez, A recursive least squares
algorithm robust to low- power excitation, in. Proc. 2004 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP-2004), Montreal, Canada, vol. II, p. 673-677, 2004.
C. Ludovico and J. C. M. Bermudez, An improved recursive least
squares algorithm robust to input power variation, in Proc. IEEE
Statistical Signal Processing Workshop, Bordeaux, France, 2005.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 93 / 107
Performance Comparison
System identication, N=100
Input AR(1) x(n) = 0.9x(n 1) + v(n)
AP algorithm with P = 2
Step sizes designed for equal LMS and NLMS performances
Impulse response to be estimated
0 10 20 30 40 50 60 70 80 90 100
0.1
0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
sample
i
m
p
u
l
s
e

r
e
s
p
o
n
s
e
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 94 / 107
Excess Mean Square Error - LMS, NLMS and AP
0 1000 2000 3000 4000 5000 6000
80
70
60
50
40
30
20
10
0
10
iteration
E
M
S
E

(
d
B
)
EMSE for LMS, NLMS and AP(2)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 95 / 107
Excess Mean Square Error - LMS, NLMS, AP and RLS
0 1000 2000 3000 4000 5000 6000
80
70
60
50
40
30
20
10
0
10
iteration
E
M
S
E

(
d
B
)
EMSE for LMS, NLMS, AP(2) and RLS
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 96 / 107
Performance Comparisons
Computational Complexity
Adaptive lter with N real coecients and real signals
For the AP algorithm, K = P + 1
Algorithm + /
LMS 2N + 1 2N
NLMS 3N + 1 3N 1
AP (K
2
+ 2K)N + K
3
+ K (K
2
+ 2K)N + K
3
+ K
2
RLS N
2
+ 5N + 1 N
2
+ 3N 1
For N = 100, P = 2
Algorithm + / factor
LMS 201 200 1
NLMS 301 300 1 1.5
AP 1, 530 1, 536 7.5
RLS 10, 501 10, 300 1 52.5
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 97 / 107
Typical values for acoustic echo cancellation (N = 1024, P = 2)
Algorithm + / factor
LMS 2, 049 2, 048 1
NLMS 3, 073 3, 072 1 1.5
AP 15, 390 15, 396 7.5
RLS 1, 053, 697 1, 051, 648 1 514
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 98 / 107
How to Deal with Computational Complexity?
Not an easy task!!!
There are fast versions for some algorithms (especially RLS)
What is usually not said is that ... speed can bring

Instability

Increased need for memory


Most applications rely on simple solutions
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 99 / 107
Understanding the Adaptive Filter Behavior
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 100 / 107
What are Adaptive Filters?
Adaptive lters are, by design, systems that are
Time-variant (w(n) is time-variant)
Nonlinear (y(n) is a nonlinear function of x(n))
Stochastic (w(n) is random)
LMS:
w(n + 1) = w(n) + e(n)x(n)
=
_
I x(n)x
T
(n)

x(n) + d(n)x(n)
y(n) = x
T
(n)w(n)
They are dicult to understand
Dierent applications and signals require dierent analyses
Simplifying assumptions are necessary
Good design requires good analytical models.
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 101 / 107
LMS Analysis
Basic equations:
d(n) = x
T
(n)w
o
+ z(n)
e(n) = d(n) x
T
(n)w(n)
w(n + 1) = w(n) + e(n)x(n)
v(n) = w(n) w
o
(study about the optimum weight vector)
Mean Weight Behavior:
Neglecting dependence between v(n) and x(n)x
T
(n),
E[v(n + 1)] = (I R
xx
)E[v(n)]
Follows the steepest descent trajectory
Convergence condition: 0 < < 2/
max
lim
n
E[w(n)] = w
o
However: w(n) is random
We need to study its uctuations about E[w(n)] MSE
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 102 / 107
The Mean-Square Estimation Error
e(n) = e
o
(n) x
T
(n)v(n), e
o
(n) = d(n) x
T
(n)w
o
Square the error equation
Take the expected value
Neglect the statistical dep. between v(n) and x(n)x
T
(n)
J
ms
(n) = E[e
2
o
(n)] + Tr{R
xx
K(n)}, K(n) = E[v(n)v
T
(n)]
This expression is independent of the adaptive algorithm
Eect of the alg. on J
ms
(n) is determined by K(n)
J
ms
min
= E[e
2
o
(n)]
J
ms
ex
= Tr{R
xx
K(n)}
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 103 / 107
The Behavior of K(n) for LMS
Using the basic equations
v(n + 1) =
_
I + x(n)x
T
(n)

v(n) + e
o
(n)x(n)
Post-multiply by v
T
(n + 1)
Take the expected value assuming v(n) and x(n)x
T
(n) independent
Evaluate the expectations

Higher order moments of x(n) require input pdf


Assuming Gaussian inputs
K(n + 1) =K(n) [R
xx
K(n) +K(n)R
xx
]
+
2
{R
xx
Tr[R
xx
K(n)] + 2R
xx
K(n)R
xx
}
+
2
R
xx
J
ms
min
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 104 / 107
Design Guidelines Extrated from the Model
Stability
0 < <
m
Tr[R
xx
]
; m = 2 or m = 2/3
J
ms
(n) converges as a function of
_
(1
i
)
2
+ 2
2

2
i

n
, i = 1, . . . , N
Steady-State
J
ms
ex
()

2
Tr[R
xx
]J
ms
min
M() =
J
ms
ex
()
J
ms
min
(MSE Misadjustment)
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 105 / 107
Model Evaluation
x(n) = x(n 1) + v(n)
0 1000 2000 3000 4000 5000 6000
80
70
60
50
40
30
20
10
0
10
EMSE. Blue: Simulation. Red: Analytical Model
E
M
S
E
LMS Excess Mean Square Error
Input: White noise
0 1000 2000 3000 4000 5000 6000
80
70
60
50
40
30
20
10
0
10
EMSE. Blue: Simulation. Red: Analytical Model
E
M
S
E
LMS Excess Mean Square Error
Input: AR(1), = 0.8
Linear system identication - FIR with 100 coecients
Step-size = 0.05
Noise power
2
v
= 10
6
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 106 / 107
Merci pour votre attention!!!
Jose Bermudez (UFSC) Adaptive Filtering IRIT - Toulouse, 2011 107 / 107