LMS Adaptive Filtering Fundamentals

11/19/2011
Lecture 7
Least Mean Square (LMS)
Adaptive Filtering
Dr. Tahir Zaidi
Steepest Descent
The update rule for SD is

where
or
SD is a deterministic algorithm, in the sense that p and R are

assumed to be exactly known.
In practice we can only estimate these functions.
Chapter 5
Adaptive Signal Processing
11/19/2011
Basic Idea
The simplest estimate of the expectations is

To remove the expectation terms and replace them with the
instantaneous values, i.e.
Then, the gradient becomes
Eventually, the new update rule is
Chapter 5
No
expectations,
Instantaneous
samples!
Basic Idea
However the term in the brackets is the error, i.e.

then
is the gradient of
instead of
as
in SD.
Chapter 5
11/19/2011
Basic Idea
Filter weights are updated using instantaneous values
Chapter 5
ELE 774 - Adaptive Signal Processing
Update Equation for

Method of Steepest Descent
Update Equation for

Least Mean-Square
Chapter 5
ELE 774 - Adaptive Signal Processing
11/19/2011
LMS Algorithm
unbiased
Since the expectations are omitted, the estimates will have a high variance.
Therefore, the recursive computation of each tap weight in the LMS
algorithm suffers from a gradient noise.
In contrast to SD which is a deterministic algorithm, LMS is a member of the

family of stochastic gradient descent algorithms.
LMS has higher MSE (J()) compared to SD (Jmin) (Wiener Soln.) as n

i.e., J(n) J() as n
Difference is called the excess mean-square error Jex()
The ratio Jex()/ Jmin is called the misadjustment.
Hopefully, J() is a finite value, then LMS is said to be stable in the
mean square sense.
LMS will perform a random motion around the Wiener solution.
Chapter 5
LMS Algorithm
Involves a feedback connection.

Although LMS might seem very difficult to work due the
randomness, the feedback acts as a low-pass filter or performs
averaging so that the randomness can be filtered-out.
The time-constant of averaging is inversely proportional to .
Actually, if is chosen small enough, the adaptive process is made
to progress slowly and the effects of the gradient noise on the tap
weights are largely filtered-out.
Computational complexity of LMS is very low very attractive
Only 2M+1 complex multiplications and 2M complex additions
per iteration.
Chapter 5
11/19/2011
LMS Algorithm
Chapter 5
Canonical Model
LMS algorithm for complex signals/with complex coef.s can be

represented in terms of four separate LMS algorithms for real
signals with cross-coupling between them.
Write the input/desired signal/tap gains/output/error in the complex
notation
Chapter 5
10
11/19/2011
Canonical Model
Then the relations bw. these expressions are
Chapter 5
11
Canonical Model
Chapter 5
12
11/19/2011
Canonical Model
13
Chapter 5
Analysis of the LMS Algorithm
Although the filter is a linear combiner, the algorithm is highly nonlinear and violates superposition and homogenity
Assume the initial condition
Analysis will continue using the weight-error vector
, then
output
and its autocorrelation
Chapter 5
input
Here we use expectation,

however, actually it is
the ensemble average!.
14
11/19/2011
Analysis of the LMS Algorithm
We have
Let
Then the update eqn. can be written as
Analyse convergence in an average sense

Algorithm run many timesstudy their ensemble average behavior
Chapter 5
15
Small Step Size Analysis
Assumption I: step size is small (how small?) LMS filter act

like a low-pass filter with very low cut-off frequency.
Assumption II: Desired response is described by a linear multiple

regression model that is matched exactly by the optimum Wiener
filter
where eo(n) is the irreducible estimation error and
Assumption III: The input and the desired response are jointly
Gaussian.
Chapter 5
16
11/19/2011
Applying the similarity transformation resulting from the eigendecom.

on
We do not have this term in

Wiener filtering!.
i.e.
Then, we have
where
Components of v(n)
are uncorrelated!
17
Chapter 5
Components of v(n) are uncorrelated:
stochastic force
first order difference equation (Brownian motion, thermodynamics)
Solution: Iterating from n=0
natural component
of v(n)
Chapter 5
forced component
of v(n)
18
11/19/2011
Learning Curves
Two kinds of learning curves

Mean-square error (MSE) learning curve
Mean-square deviation (MSD) learning curve
Ensemble averaging results of many () realizations are averaged.
What is the relation bw. MSE and MSD?

for small
Chapter 5
19
Learning Curves
for small
under the assumptions of slide 17.

Excess MSE
LMS performs worse than SD, there is always an excess MSE
use
Chapter 5
20
10
11/19/2011
Learning Curves
or
Mean-square deviation D is lower-upper bounded by the excess MSE.
They have similar response: decaying as n grows
Chapter 5
21
Convergence
For small
Hence, for convergence

or
The ensemble-average learning curve of an LMS filter does not

exhibit oscillations, rather, it decays exponentially to the const. value
Jex(n)
Chapter 5
22
11
11/19/2011
Misadjustment
Misadjustment, define
For small , from prev. slide
or equivalently
but
then
Chapter 5
23
Average Time Constant
From SD we know that
but
then
Chapter 5
24
12
11/19/2011
Observations
Misadjustment is
directly proportional to the filter length M, for a fixed mse,av
inversely proportional to the time constant mse,av
Directly proportional to the step size
slower convergence results in lower misadjustment.

smaller step size results in lower misadjustment.
Time constant is
inversely proportional to the step size
smaller step size results in slower convergence
Large requires the inclusion of k(n) (k1) into the analysis

Difficult to analyse, small step analysis is no longer valid,
learning curve becomes more noisy
Chapter 5
25
LMS vs. SD
Main goal is to minimise the Mean Square Error (MSE)

Optimum solution found by Wiener-Hopf equations.
Requires auto/cross-correlations.
Achieves the minimum value of MSE, Jmin.
LMS and SD are iterative algorithms designed to find wo.
SD has direct access to auto/cross-correlations (exact measurements)
can approach the Wiener solution wo, can go down to Jmin.
LMS uses instantenous estimates instead (noisy measurements)
Chapter 5
fluctuates around wo in a Brownian-motion manner, at most J().

26
13
11/19/2011
LMS vs. SD
Learning curves
SD has a well-defined curve composed of decaying exponentials
Chapter 5
For LMS, curve is composed of noisy- decaying exponentials
27
LMS Limits
As filter length increases, M, Imposes a limit on the step size to

avoid instability
If the upper bound is exceeded, instability is observed.
Smax: maximum component

of the PSD S() of the tap
inputs u(n).
Chapter 5
28
14
11/19/2011
LMS Example
One tap predictor of order one AR process. Let
Chapter 5
29
LMS Example
Chapter 5
30
15
11/19/2011
LMS Example
Chapter 5
31
LMS Example: Learning Curves
Chapter 5
32
16
11/19/2011
H Optimality of LMS
A single realisation of LMS is not optimum in the MSE sense

Ensemble average is.
The previous derivation is heuristic
(replacing auto/cross correlations with their instantenous estimates.)
In what sense is LMS optimum?

It can be shown that LMS minimises
Minimising the maximum of something minimax
Chapter 5
Maximum energy gain of the filter under the constraint
Optimisation of an H criterion.
33
H Optimality of LMS
Provided that the step size parameter satisfies the limits on the
prev. slide, then
no matter how different the initial weight vector
is from the
unknown parameter vector wo of the multiple regression model, and
irrespective of the value of the additive disturbance n(n),
the error energy produced at the output of the LMS filter will never
exceed a certain level.
Chapter 5
34
17
11/19/2011
Limits on the Step Size
Chapter 5
35
18

LMS Adaptive Filtering Fundamentals

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

LMS Adaptive Filtering Fundamentals

Загружено:

Авторское право:

Доступные форматы

11/19/2011

The update rule for SD is

SD is a deterministic algorithm, in the sense that p and R are

In practice we can only estimate these functions.

Adaptive Signal Processing

The simplest estimate of the expectations is

Then, the gradient becomes

Eventually, the new update rule is

Adaptive Signal Processing

However the term in the brackets is the error, i.e.

Adaptive Signal Processing

Filter weights are updated using instantaneous values

ELE 774 - Adaptive Signal Processing

Update Equation for

Update Equation for

ELE 774 - Adaptive Signal Processing

In contrast to SD which is a deterministic algorithm, LMS is a member of the

LMS has higher MSE (J()) compared to SD (Jmin) (Wiener Soln.) as n

Adaptive Signal Processing

Involves a feedback connection.

Adaptive Signal Processing

Adaptive Signal Processing

LMS algorithm for complex signals/with complex coef.s can be

Adaptive Signal Processing

Then the relations bw. these expressions are

Adaptive Signal Processing

Adaptive Signal Processing

Adaptive Signal Processing

Analysis of the LMS Algorithm

Assume the initial condition

Analysis will continue using the weight-error vector

and its autocorrelation

Here we use expectation,

Analysis of the LMS Algorithm

Then the update eqn. can be written as

Analyse convergence in an average sense

Adaptive Signal Processing

Small Step Size Analysis

Assumption I: step size is small (how small?) LMS filter act

Assumption II: Desired response is described by a linear multiple

Adaptive Signal Processing

Small Step Size Analysis

Applying the similarity transformation resulting from the eigendecom.

We do not have this term in

Adaptive Signal Processing

Small Step Size Analysis

Components of v(n) are uncorrelated:

first order difference equation (Brownian motion, thermodynamics)

Solution: Iterating from n=0

Adaptive Signal Processing

Two kinds of learning curves

Mean-square deviation (MSD) learning curve

Ensemble averaging results of many () realizations are averaged.

What is the relation bw. MSE and MSD?

Adaptive Signal Processing

under the assumptions of slide 17.

Adaptive Signal Processing

Mean-square deviation D is lower-upper bounded by the excess MSE.

They have similar response: decaying as n grows

Adaptive Signal Processing

Hence, for convergence

The ensemble-average learning curve of an LMS filter does not