Вы находитесь на странице: 1из 329

Chapter 1

Brief Review of Discrete-Time Signal Processing

Brief Review of Random Processes


References:
A.V.Oppenheim and A.S.Willsky, Signals and Systems, Prentice Hall,
1996
J.G.Proakis and D.G.Manolakis, Introduction to Digital Signal
Processing, Macmillan, 1988
A.V.Oppenheim and R.W.Schafer, Discrete-Time Signal Processing,
Prentice Hall, 1998
P.M.Clarkson, Optimal and Adaptive Signal Processing, CRC, 1993
P.Stoica and R.Moses, Introduction to Spectral Analysis, Prentice Hall,
1997

1
Brief Review of Discrete-Time Signal Processing
There are 3 types of signals that are functions of time:
continuous-time (analog) : defined on a continuous range of time
discrete-time : defined only at discrete instants of time (,(n-1)T,nT,
(n+1)T,)
digital (quantized) : both time and amplitude are discrete

x(t) x(nT) xQ(nT) digital


signal
analog T sampled quantized processor
signal signal signal

Time & Amplitude Time discrete Time & Amplitude


continuous Amplitude continuous discrete

2
Digital Signal Processing Applications
Speech
Coding (compression)
Synthesis (production of speech signals, e.g., speech development kit
by Microsoft )
Recognition (e.g., PCCWs 1083 telephone number enquiry system
and many applications for disabled persons as well as security)
Animal sound analysis
Music
Generation of music by different musical instruments such as piano,
cello, guitar and flute using computer
Song with low-cost electronic piano keyboard quality

3
Image
Compression
Recognition such as face, palm and fingerprint
Construction of 3D objects from 2D images
Animation, e.g., Toy Story ()
Special effects such as adding Forrest Gump to a film of President
Nixon in and removing some objects in a photograph or
movie

Digital Communications
Encryption
Transmission and Reception (coding / decoding, modulation /
demodulation, equalization)

Biometrics and Bioinformatics

Digital Control

4
Transform from Time to Frequency
transform
x(t ) X ()
inverse
transform

Fourier Series
express periodic signals using harmonically related sinusoids
different definitions for continuous-time & discrete-time signals
frequency takes discrete values: 0 , 20 , 30 , ...

Fourier Transform
frequency analysis tool for aperiodic signals
defined on a continuous range of
different definitions for continuous-time & discrete-time signals
Fast Fourier transform (FFT) an computationally efficient method for
computing Fourier transform of discrete signals

5
6
Transform Time Domain Frequency Domain

periodic & continuous aperiodic & discrete



x(t ) = ck e jk0 t , 0 T P / 2 j 0 kt
Fourier Series ck = x (t ) e dt ,
k = 2 T P / 2
0 = 2 / TP TP is the period
aperiodic & continuous aperiodic & continuous
Fourier 1 j t

Transform x(t ) = X ()e d X () = x(t )e jt dt
2
aperiodic & discrete periodic & continuous
Discrete-Time T /T
Fourier x(nT ) = X ()e
jnT
d , X () = x(nT )e jnT
2 / T n =
Transform
T is the sampling interval
periodic & discrete periodic & discrete
Discrete(-Time) N 1 1 N 1
j 2 kn / N j 2 kn / N
Fourier Series x ( n ) = c k e , ck = x ( n )e
k =0 N n=0
TP = N and T = 1

7
Fourier Series
Fourier series are used to represent the frequency contents of a periodic
and continuous-time signal. A continuous-time function x(t ) is said to be
periodic if there exists TP > 0 such that

x(t ) = x(t + TP ), t (, ) (I.1)

The smallest TP for which (I.1) holds is called the fundamental period.
Every periodic function can be expanded into a Fourier series as


x(t ) = c k e jk0t , t (, ) (I.2)
k =
where
0 TP / 2 j kt
ck = x(t )e 0 dt (I.3)
2 TP / 2

and 0 = 2 / TP is called the fundamental frequency.

8
Example 1.1
The signal x(t ) = cos(100t ) + cos(200t ) is a periodic and continuous-
time signal.

The fundamental frequency is 0 = 100 . The fundamental period is then


TP = 2 /(100) = 1 / 50 :
1 1 1
x t + = cos100 t + + cos 200 t +
50 50 50
= cos(100t + 2 ) + cos(200t + 4 )
= cos(100t ) + cos(200t ) = x(t )

e j 0 t + e j 0 t e j 2 0 t + e j 2 0 t
Since x(t ) = cos(100t ) + cos(200t ) = +
2 2
By inspection and using (I.2), we have c1 = 1 / 2 , c1 = 1 / 2 , c2 = 1 / 2 ,
c 2 = 1 / 2 while all other Fourier series coefficients are equal to zero.

9
Fourier Transform

Fourier transform is used to represent the frequency contents of an


aperiodic and continuous-time signal x(t ) :

Forward transform: X () = x(t )e jt dt (I.4)

and
1 j t
Inverse transform: x(t ) = X ( )e d (I.5)
2

Some points to note:

Fourier spectrum (both magnitude and phase) are continuous in


frequency and aperiodic
Convolution in time domain corresponds to multiplication in Fourier
transform domain, i.e., x(t ) y (t ) X () Y ()

10
Example 1.2
Find the Fourier transform of the following rectangular pulse:

1, t < T1
x(t ) =
0, t > T1

Using (I.4), T1 jt 2 sin(T1 )


X () = T1 e dt =

11
Example 1.3
Find the inverse Fourier transform of

1, <W
X () =
0, >W
Using (I.5),
1 W j t sin(Wt )
x(t ) = W e d =
2 t

12
Discrete-Time Fourier Transform (DTFT)

DTFT is a frequency analysis tool for aperiodic and discrete-time signals.


If we sample an aperiodic and continuous-time function x(t ) with a
sampling interval T , the sampled output x s (t ) is expressed as

x s (t ) = x(t ) (t nT ) (I.6)
n =

13
The DTFT can be obtained by substituting x s (t ) into the Fourier transform
equation of (1.4):

X () = xs (t )e jt dt


= x(t ) (t nT )e jt dt
n =

= x(t )(t nT )e jt dt
n =

= x(nT )e jnT
n = (I.7)

where sifting property of unit-impulse function is employed to obtain (1.7):


f (t )(t t 0 )dt = f (t 0 )

14
Some points to note:

DTFT spectrum (both magnitude and phase) is continuous in frequency


and periodic with period 2 / T
When the sampling interval is normalized to 1, we have


Forward Transform: X () = x(n)e jn (I.8)
n =
and
1 jn
Inverse Transform: x ( n) = X ()e d (I.9)
2

Discrete-Time Fourier Series (DTFS)

DTFS is used for analyzing discrete-time periodic signals. It can be


derived from the Fourier series.

15
Example 1.4
Find the DTFT of the following discrete-time signal:

1, n N1
x[n] =
0, n > N1

Using (I.8),
N1 = 2

N1
j n
X () = e
n = N1

jN1 j j 2 j 2 N1 sin(( N1 + 1 2))


=e (1 + e +e +L+ e )=
sin( 2)

16
z-Transform

It is a useful transform of processing discrete-time signal. In fact, it is a


generalization of DTFT for discrete-time signals


X ( z ) = Z {x[n]} = x[n]z n (I.10)
n =
where z is a complex variable. Substituting z = e j yields DTFT.

Moreover, substituting z = re j gives


X ( z ) = x[n]r n e jn = F {x[n]r n } (I.11)
n =

17
Advantages of using z -transform over DTFT:
can encompass a broader class of signal since Fourier transform does
not converge for all sequences:
A sufficient condition for convergence of the DTFT is

j n
| X () | | x(n) | | e | | x(n) |< (I.12)
n = n =
Therefore, if x(n) is absolutely summable, then X () exists.
On the other hand, by representing z = re j , the z -transform exists if

j n jn
| X ( z ) |=| X (re ) | | x(n)r || e | | x(n)r n |< (I.13)
n = n =
we can choose a region of convergence (ROC) for z such that the z -
transform converges
notation convenience : z e j
can solve problems in discrete-time signals and systems, e.g. difference
equations

18
Example 1.5
Determine the z-transform of x[n] = a n u[n].


X ( z) = a n
u[n]z n
= (az 1 ) n
n = n =0


1 n
X (z ) converges if az < . This requires az 1 < 1 or z > a , and
n =0
1
X ( z) =
1 az 1

Notice that for another signal x[n] = a n u[ n 1] ,


1
X ( z ) = (a ) z n n
= a

m m
z

= a z ( )
1 m

n = m =1 m =1

19
In this case, X (z ) converges if a 1 z < 1 or z < a , and
1
X ( z) =
1 az 1

ROC of ROC of
x[n] = a n u[n] x[n] = a n u[n 1]

Some points to note:


Different signals can give same z-transform, although the ROCs differ
When x[n] = a n u[n] with || a |> 1, its DTFT does not exist

20
21
Transfer Function and Difference Equation
A linear time-invariant (LTI) system with input sequence x(n) and output
sequence y (n) are related via an Nth-order linear constant coefficient
difference equation of the form:
N M
a k y (n k ) = bk x(n k ), a 0 0, b0 0 (I.14)
k =0 k =0
Applying z -transform to both sides with the use of the linearity property
and time-shifting property, we have
N M
k
ak z Y ( z ) = bk z k X ( z ) (I.15)
k =0 k =0
The system (or filter) transfer function is expressed as
M M
k 1
bk z (1 c k z )
Y ( z) k =0 b0 k =1
H ( z) = = N
= N (I.16)
X ( z) k a 0 (1 d z 1 )
ak z k
k =0 k =1
1
where each (1 ck z ) contributes a zero at z = ck and a pole at z = 0
while each (1 d k z 1 ) contributes a pole at z = d k and a zero at z = 0 .

22
The frequency response of the system or filter can be computed as

H () = H ( z ) z =exp( j) (I.17)

From (1.14), the output y (n) is expressed as

1 M N
y ( n ) = bk x ( n k ) a k y ( n k ) (I.18)
a 0 k =0 k =1

When at least one of the {a1 , a 2 , L , a N } is non-zero, then y (n) depends on


its past samples as well as the input signal x(n) . The system or filter in this
case is known as an infinite impulse response (IIR) system. Applying
inverse DTFT or z transform to the transfer function, it can be shown that
the system impulse response is of infinite duration.
When all {a1 , a 2 , L , a N } are equal to zero, y (n) depends on x(n) only. It is
known as a finite impulse response (FIR) system because the impulse
response is of finite duration.

23
Example 1.6
Consider a LTI system with the input x[n] and output y[n] satisfy the
following linear constant-coefficient difference equation,
1 1
y[n] y[n 1] = x[n] + x[n 1]
2 3

Find the system function and frequency response.

Taking z-transform on both sides,


1 1
Y ( z ) z 1Y ( z ) = X ( z ) + z 1 X ( z )
2 3
Thus,
1 1 1 j
1+ z 1+ e
Y ( z) 3 3
H ( z) = = and H () = H ( z ) z = exp( j) =
X ( z ) 1 1 z 1 1 j
1 e
2 2

24
25
Example 1.7
Suppose you need to high-pass the signal x[n] by the high-pass filter with
the following transfer function

1
H ( z) =
1 + 0.99 z 1

How to obtain the filtered signal y[n] ?

Y ( z) 1 1
H ( z) = = 1
Y ( z ) + 0 . 99 z Y ( z) = X ( z)
X ( z ) 1 + 0.99 z

Taking the inverse z-transform

y[n] + 0.99 y[n 1] = x[n]

y[n] = 0.99 y[n 1] + x[n] ( y[1] = 0 for initialization)

26
27
Causality, Stability and ROC:

Causality condition: h[n] = 0 for all n < 0 ,

h[n] is right-sided

The ROC for H (z ) is


the exterior of an origin-centered circle (including z = )

If H ( z ) is rational, the ROC for H ( z ) is


the exterior outside the outermost pole.

Stability condition: h[n] <
n =

H (e j ) , i.e., the Fourier transform of h[n], converges

The ROC for H ( z ) includes the unit circle z = 1

28
Example 1.8
Verify if the system impulse response h[n] = 0.5 n u[n] is causal and stable.

It is obvious that h[n] is causal because h[n] = 0 for all n < 0 . On the other
hand,
1
n n 1 n
H ( z ) = 0.5 u[n]z = ( 0 .5 z ) =
n = n=0 1 0.5 z 1

1 n
H (z ) converges if 0.5 z < . This requires 0.5 z 1 < 1 or z > 0.5 ,
n=0

i.e., ROC for H (z ) is the exterior outside the pole of 0.5

(Notice that for another impulse response h[n] = 0.5n u[ n 1], and it
corresponds to an unstable system because the ROC for H ( z ) is
z < 0 .5 )

29
The z-transform for h[n] is

1
H ( z) = 1
, | z |> 0.5
1 0 .5 z

Hence it is stable because the ROC for H (z ) includes the unit circle z = 1

On the other hand, its stability can also be shown using:


n 2 3
h[n] = 0.5 = 1 + 0.5 + 0.5 + L
n = n=0
1
= =2
1 0.5
<

30
Brief Review of Random Processes
Basically there are two types of signals:

Deterministic Signals

exactly specified according to some mathematical formulae


characterized by finite parameters
e.g., exponential signal, sinusoidal signal, ramp signal, etc.
a simple mathematical model of a musical signal is


x(t ) = a (t ) cm cos(2mf 0t + m )
m =1
where:

f 0 is the fundamental frequency or pitch


cm is the amplitude and m is the phase of the mth harmonic

31
a (t ) is the envelope

32
33
Random Signals

cannot be directly generated by any formulae and their values cannot


be predicted
characterized by probability density function (PDF), mean, variance,
power spectrum, etc.
e.g., thermal noise , stock values, autoregressive (AR) process,
moving average (MA) process, etc.
a simple voiced discrete-time speech model is

P
x[n] = ai x[n i ] + w[n]
i =1
where

{ ai } are called the AR parameters


w[n] is a noise-like process
P is the order of the AR process

34
Definitions and Notations
1. Mean Value
The mean value of a real random variable x(n) at time n is defined as

(n) = E{x(n)} = x(n) f ( x(n))d ( x(n) ) (I.19)

where f ( x(n)) is the PDF of x(n) such that

f ( x(n))d ( x(n) ) = 1 and f ( x(n)) 0

Note that, in general,


(m) (n), mn (I.20)
and
1 N 1
( m) x ( n) (I.21)
N n =0
The mean value is also called expected value and ensemble mean.

35
2. Moment

Moment is the generalization of the mean value:


E{( x(n) ) } = ( x(n) )m f ( x( n))d ( x( n) )
m
(I.22)

When m = 1, it is the mean while when m = 2 , it is called the mean square


value of x(n) .

3. Variance
The variance of a real random variable x(n) at time n is defined as

(n) = E{( x(n) (n)) } = ( x(n) (n)) 2 f ( x(n))d ( x(n) )
2 2
(I.23)

It is also called second central moment.

36
Example 1.9
Determine the mean, second-order moment, variance of a quantization
error, x , with the following PDF:

a a
1 1 1 2
= x f ( x)dx = x dx = x =0
a 2a 2a 2 a
a
2

2
a
2 1 1 1 3 a2
E{x } = x f ( x)dx = x dx = x =
a 2a 2a 3 a 3
2
a
2 = E{( x ) 2 } = E{x 2 } =
3
37
4. Autocorrelation
The autocorrelation of a real random signal x(n) is defined as

R xx (m, n) = E{x(m) x(n)} = x(m) x(n) f ( x(m), x(n) )d ( x(m) )d ( x(n) ) (I.24)

where f ( x(m), x( n) ) is the joint PDF of x(m) and x(n) . It measures the
degree of association or dependence between x at time index n and at
index m .
In particular,
R xx (n, n) = E{x 2 ( n)} (I.25)
is the mean square value or average power of x(n) . Moreover, when x(n)
has zero-mean, then
2 ( n) = R xx (n, n) = E{x 2 ( n)} (I.26)
That is, the power of x(n) is equal to the variance of x(n) .

38
5. Covariance
The covariance of a real random signal x(n) is defined as

C xx (m, n) = E{( x(m) (m) )( x( n) (n) )} (I.27)

Expanding (I.27) gives

C xx (m, n) = E{x(m) x(n)} ( m)(n)


In particular,

C xx (n, n) = E{( x(n) (n) )2 } = 2 (n)

is the variance, and for zero-mean x(n) , we have

C xx (m, n) = R xx (m, n)

39
6. Crosscorrelation
The crosscorrelation of two real random signals x(n) and y (n) is defined
as

R xy (m, n) = E{x(m) y (n)} = x(m) y (n) f ( x(m), y (n) )d ( x(m) )d ( y (n) ) (I.28)

where f ( x(m), y (n) ) is the joint PDF of x(m) and y (n) . It measures the
correlation of x(n) and y (n) . The signals x(m) and y (n) are uncorrelated if
Rxy (m, n) = E{x(m)} E{x(n)}.

7. Independence
Two real random variables x(n) and y (n) are said to be independent if
f ( x(n), y (n) ) = f ( x(n) ) f ( y (n) ) E{x(n) y (n)} = E{x(n)} E{ y (n)} (I.29)
Q.: Does uncorrelated implies independent or vice versa?

40
8. Stationarity
A discrete random signal is said to be strictly stationary if its k -th order
PDF f ( x(n1 ), x(n2 ), L , x(nk )) is shift-invariant for any set of n1 , n2, L , nk
and for any k . That is
f ( x(n1 ), x(n2 ), L , x(nk )) = f ( x(n1 + n0 ), x(n2 + n0 ), L , x(nk + n0 )) (I.30)

where n0 is an arbitrary shift and for all k . In particular, a real random


signal is said to be wide-sense stationary (WSS) if the first and second
order moments, viz., its mean and autocorrelation, are shift-invariant.
This means
= E{x(n)} = E{x(m)}, mn (I.31)
and

R xx (i ) = R xx (m n) = R xx (m, n) = E{x(m) x(n)} (I.32)


where i = m n is called the correlation lag.

41
Three important properties of R xx (i ) :
(i) R xx (i ) is an even sequence, i.e.,
R xx (i ) = R xx (i ) (I.33)
and hence is symmetric about the origin.
Q.: Why is it an even sequence?
(ii)The mean square value or power is greater than or equal the magnitude
of the correlation for any other lag, i.e.,
E{x 2 (n)} = R xx (0) | R xx (i ) |, i0 (I.34)
which can be proved by the Cauchy-Schwarz inequality:
| E{a b} | E{a 2 } E{b 2 }
(iii)When x(n) has zero-mean, then
2 = E{x 2 (n)} = R xx (0) (I.35)

42
9. Ergodicity
A stationary process is said to be ergodic if its time average using infinite
samples equals its ensemble average. That is, the statistical properties of
the process can be determined by time averaging over a single sample
function of the process. For example,

Ergodic in the mean if


1 N / 21
= E{x(n)} = lim x ( n)
N N n = N / 2

Ergodic in the autocorrelation function if


1 N / 21
R xx (i ) = E{x(n) x(n i )} = lim x ( n) x ( n i )
N N n = N / 2

Unless stated otherwise, we assume that random signals are ergodic (and
thus stationary) in this course.

43
Example 1.10
Consider an ergodic stationary process { x[n] }, = L,1,0,1,L which is
uniformly distributed between 0 and 1.

The ensemble average or mean of x[n] at time m is

1 1
1 1
[m] = x[m] f ( x[m])dx[m] = x[m]dx[m] = x 2 [m] =
0 2 0 2

It is clear that the mean of x[n] is also = 0.5 for all n

Because of ergodicity, the time average is

1 N / 2 1 1
lim x[n] = =
N N n=N / 2 2

44
10. Power Spectrum
For random signals, power spectrum or power spectral density (PSD) is
used to describe the frequency spectrum.

Q.: Can we use DTFT to analyze the spectrum of random signal? Why?

The PSD is defined as:



xx () = R xx (i )e ji = Z [R xx (i )] z = exp( j) (I.36)
i =

Given xx () , we can get R xx (i ) using

1 ji
R xx (i ) = xx ()e d (I.37)
2
Q.: Why?

45
Under a mild assumption:
1 N
lim k R xx (k ) = 0
N N k = N

it can be proved (1.36) is equivalent to


1 N 1 2

j n
xx () = lim E x ( n)e (1.38)
N N n =0

N 1
Since x(n)e jn corresponds to the DTFT of x(n) , we can consider the
n=0
2
PSD as the time average of X () based on infinite samples.

(1.38) also implies that the PSD is a measure of the mean value of the
DTFT of x(n) .

46
Common Random Signal Models
1. White Process
A discrete-time zero-mean signal w(n) is said to be white if
2w , m=n
R ww (m n) = E{w(n) w(m)} = (I.39)
0, otherwise
Moreover, the PSD of w(n) is flat for all frequencies:

ww () = Rww (i )e ji = R ww (0) e j0 = 2w
i =
Notice that white process does not specify its PDF. They can be of
Gaussian-distributed, uniform-distributed, etc.

47
2. Autoregressive Process
An autoregressive (AR) process of order M is defined as

x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M ) + w(n) (I.40)


where w(n) is a white process.

Taking the z -transform of (1.40) yields


X ( z) 1
H ( z) = =
W ( z ) 1 a1 z 1 a 2 z 2 L a M z M

Let h(n) = Z 1 {H ( z )}, we can write



x(n) = h(n) w(n) = h(n k ) w(k ) = w(n k )h(k )
k = k =

Q.: What Is the mean value of x(n) ?

48
Input-output relationship of random signals is:
R xx (m) = E{x(n) x(n + m)}

= E h(k1 ) w(n k1 ) h(k 2 ) w(n + m k 2 )
k1 = k2 =

= h(k1 )h(k 2 ) E{w(n k1 ) w(n + m k 2 )}
k1 = k2 =

= h(k1 )h(k 2 ) Rww (m + k1 k 2 )
k1 = k2 =

= Rww (m k ) h(k1 )h(k + k1 ), k = k 2 k1
k = k1 =

R xx (m) = Rww (m) g (m), g (k ) = h(k1 )h(k + k1 ) = h(k ) h(k )
k1 =
2
xx () = ww () G (), G () = H ()
2
xx () = ww () H () (I.41)

49
Note that (1.41) applies for all stationary input processes and impulse
responses.
In particular, for the AR process, we have

2w
xx () = (1.42)
jM 2
1 a1e j a 2 e j 2 L a M e

3. Moving Average Process

A moving average (MA) process of order N is defined as


x(n) = b0 w(n) + b1w(n 1) + L + b N w(n N ) (I.43)

Applying (1.41) gives


j j N 2
xx () = b0 + b1e + L + bN e 2w (I.44)

50
4. Autoregressive Moving Average Process

An autoregressive moving average (ARMA) process is defined as

x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M )


(I.45)
+ b0 w(n) + b1 w(n 1) + L + b N w(n N )

Applying (1.41) gives

j jN 2
b0 + b1e + L + bN e
xx () = 2w (1.46)
jM 2
1 a1e j a 2 e j 2 L a M e

51
Questions for Discussion
1. Consider a signal x(n) and a stable system with transfer function
H ( z ) = B ( z ) / A( z ) . Let the system output with input x(n) be y (n) .
Can we always recover x(n) from y (n) ? Why? You may consider the
simple cases of B( z ) = 1 + 2 z 1 and A( z ) = 1 as well as
B( z ) = 1 + 0.5 z 1 and A( z ) = 1.
2. Given a random variable x with mean x and variance 2x . Determine
the mean, variance, mean square value of
y = ax + b
where a and b are finite constants.
3. Is AR process really stationary? You can answer this question by
examining the autocorrelation function of a first-order AR process, say,
x(n) = ax(n 1) + w(n)

52
Chapter 2
Simulation Techniques
References:

S.M.Kay, Fundamentals of Statistical Signal Processing: Estimation


Theory, Prentice Hall, 1993
C.L.Nikias and M.Shao, Signal Processing with Alpha-Stable Distribution
and Applications, John Wiley & Sons, 1995
V.K.Ingle and J.G.Proakis, Digital Signal Processing Using MATLAB
V.4, PWS Publishing Company, 1997
E.Part-Enander, A.Sjoberg, B.Melin and P.Isaksson, The MATLAB
Handbook, Addison-Wesley, 1996
S.K.Park and K.W.Miller, Random number generators: good ones are
hard to find, Communications of the ACM, vol.31, no.10, Oct. 1988

1
Simulation Techniques
Signal Generation
1. Deterministic Signals
It is trivial to generate deterministic signals given the synthesis formula,
e.g., for a single real tone, it is generated by
x(n) = A cos(n + ), n = 0,1, L , N 1
MATLAB code:

N=10; % number of samples is 10


A=1; % tone amplitude is 1
w=0.2; % frequency is 0.2
p=1; % phase is 1

for n=1:N
x(n)=A*cos(w*(n-1)+p); % note that index should be > 0
end

2
An alternative approach is

n=0:N-1; % define a vector of size N


x = A.*cos(w.*n+p); % the first time index is also 1
% .* is used in vector multiplication
Both give

x=

Columns 1 through 7

0.5403 0.3624 0.1700 -0.0292 -0.2272 -0.4161 -0.5885

Columns 8 through 10

-0.7374 -0.8569 -0.9422

Q.: Which approach is better? Why?

3
Example 2.1
Recall the simple mathematical model of a musical signal:


x(t ) = a (t ) cm cos(2mf 0t + m )
m =1

A further simplified form is


x(t ) = cos(2f 0t )

where each music note has a distinct f 0 .

Let's the following piece of music:

AA EE F# F# EE
DD C#C# BB AA
EE DD C# C# BB (repeat once)

(repeat first two lines once)

4
The American Standard pitch for each of these notes is:

A: 440.00 Hz
B: 493.88 Hz
C#: 554.37 Hz
D: 587.33 Hz
E: 659.26 Hz
F#: 739.99 Hz
Assuming that each note lasts for 0.5 second and a sampling frequency of
8000 Hz, the MATLAB code for producing this piece of music is:

a=sin(2*pi*440*(0:0.000125:0.5)); % frequency for A


b=sin(2*pi*493.88*(0:0.000125:0.5)); % frequency for B
cs=sin(2*pi*554.37*(0:0.000125:0.5)); % frequency for C#
d=sin(2*pi*587.33*(0:0.000125:0.5)); % frequency for D
e=sin(2*pi*659.26*(0:0.000125:0.5)); % frequency for E
fs=sin(2*pi*739.99*(0:0.000125:0.5)); % frequency for F#

5
line1=[a,a,e,e,fs,fs,e,e]; % first line of song
line2=[d,d,cs,cs,b,b,a,a]; % second line of song
line3=[e,e,d,d,cs,cs,b,b]; % third line of song
song=[line1,line2,line3,line3,line1,line2]; % composite song
sound(song,8000); % play sound with 8kHz sampling frequency
wavwrite(song,'song.wav'); % save song as a wav file

Note that in order to attain better music quality (e.g., flute, violin), we
should use the more general model:


x(t ) = a (t ) cm cos(2mf 0t + m )
m =1

Q.: How many discrete-time samples in the 0.5 second note with 8000
Hz sampling frequency?

Q.: How to change the sampling frequency to 16000 Hz?

6
2. Random Signals
Uniform Variable
A uniform random sequence can be generated by
x(n) = seed n = (a seed n 1 ) mod(m), n = 1,2, L
where seed 0 , a and m are positive integers. The numbers generated
should be (approximately) uniformly distributed between 0 and (m 1) .
A set of choice for a and m which generates good uniform variables is
a = 16807 and m = 2147483647
This uniform PDF can be changed easily by scaling and shifting the
generation formula. For example, a random number which is uniformly
between 0.5 and 0.5 is given by
seed n = ( a seed n 1 ) mod(m)
seed n
x ( n) = 0.5
m

7
The power of x(n) is
0.5 0.5
var( x) = x p ( x) dx = x 2 dx
2
0.5 0.5
1
=
12
Note that x(n) is independent (white).

To generate a white uniform number with variance 2x :


seed n = ( a seed n 1 ) mod(m)
seed n
x ( n) = 0.5 12 x
m
MATLAB code for generating zero-mean uniform numbers with power 2:
N=5000; % number of samples is 5000
power = 2; % signal power is 2
u = (rand([1,N])-0.5).*sqrt(12*power); % rand give a uniform number
% in [0,1]

8
Evaluation of MATLAB uniform random numbers:

m = mean(u) % * mean computes the time average

m = 0.0172

p = mean(u.*u) % compute power

p = 2.0225

y = mean((u-m).*(u-m)) % compute variance

v = 2.0222

9
plot(u); % plot the signal
2.5

1.5

0.5

-0.5

-1

-1.5

-2

-2.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

10
hist(u,20) % plot the histogram for u
% with 20 bars
300

250

200

150

100

50

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Q.: Is the random generator acceptable? Does ergodicity hold?

11
a = xcorr(u); % compute the autocorrelation
plot(a) % plot the autocorrelation
12000

10000

8000

6000

4000

2000

-2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

12
axis([4990, 5010, -500, 12000]) % change the axis
12000

10000

8000

6000

4000

2000

0
4990 4992 4994 4996 4998 5000 5002 5004 5006 5008 5010

The time index at 5000 corresponds to Ruu (0)


white

13
Gaussian Variable
Given a pair of independent uniform numbers which are uniformly
distributed between [0,1], say, (u1 , u 2 ) , a pair of independent Gaussian
numbers, which have zero-mean and unity variance, can be generated
from:
w1 = 2 ln(u1 ) cos(2u 2 )

w2 = 2 ln(u1 ) sin( 2u 2 )
This is known as the Box-Mueller transformation. Note that the Gaussian
numbers are white.

MATLAB code for generating zero-mean Gaussian numbers with power 2:


N=5000; % number of samples is 5000
power = 2; % signal power is 2
w = randn([1,N]).*sqrt(power); % randn give Gaussian number

14
% with mean 0 and variance 1
Evaluation of MATLAB Gaussian random numbers:

m = mean(w) % * mean computes the time average

m = 0.0123

p = mean(w.*w) % compute power

p = 2.0158

y = mean((w-m).*(w-m)) % compute variance

v = 2.0157

15
plot(w); % plot the signal
6

-2

-4

-6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

16
hist(w,20) % plot the histogram for w
% with 20 bars
800

700

600

500

400

300

200

100

0
-6 -4 -2 0 2 4 6

17
a = xcorr(w); % compute the autocorrelation
plot(a) % plot the autocorrelation
12000

10000

8000

6000

4000

2000

-2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

18
axis([4990, 5010, -500, 12000]) % change the axis
12000

10000

8000

6000

4000

2000

4990 4992 4994 4996 4998 5000 5002 5004 5006 5008 5010

The time index at 5000 corresponds to Rww (0)

19
Impulsive Variable

The main feature of impulsive or impulse process is that its value can be
very large. A mathematical model for impulsive noise is called -stable
process, where 0 < 2 .

-stable process is a generalization of Gaussian process ( = 2 ) and


Cauchy process ( = 1)

The variable is more impulsive for a smaller

A -stable variable is generated using two independent variables:


which is uniform on ( 0.5,0.5 ), and W which is exponentially distributed
with unity mean, where W is produced from
W = ln(u )
where u is a uniform variable distributed on [0,1]

20
MATLAB code for 0 < < 2 and 1

alpha = 1.8; % alpha is set to 1.8


beta = 0; % beta is a symmetric parameter
N=5000;
phi = (rand(1,N)-0.5)*pi;
w = -log(rand(1,N));
k_alpha = 1 - abs(1-alpha);
beta_a = 2*atan(beta*tan(pi*alpha/2.0))/(pi*k_alpha);
phi_0 = -0.5*pi*beta_a*k_alpha/alpha;
epsilon = 1 - alpha;
tau = -epsilon*tan(alpha*phi_0);
a = tan(0.5.*phi);
B = tan(0.5.*epsilon.*phi)./(0.5.*epsilon.*phi);
b = tan(0.5.*epsilon.*phi);
z = (cos(epsilon.*phi)-tan(alpha.*phi_0).*sin(epsilon.*phi))./(w.*cos(phi));
d = (z.^(epsilon./alpha) - 1)./epsilon;
i = (2.*(a-b).*(1+a.*b) - phi.*tau.*B.*(b.*(1-a.^2)-2.*a)).*(1+epsilon.*d)./((1-a.^2).*(1+b.^2))+tau.*d;

21
plot(i);
40

20

-20

-40

-60

-80

-100

-120

-140
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

22
MATLAB code for = 1
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
plot(i)
2000

1500

1000

500

-500

-1000
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

23
PDF for different

24
The impulsiveness is due to the heavier tails, i.e., PDF go to zero slowly

25
AR, MA and ARMA Processes
MA process is generated from

x(n) = b0 w(n) + b1w(n 1) + L + b N w(n N )

where { w(n) } is a white noise sequence. Only the transient signal is


needed to remove.
e.g., for a second-order MA process

x(n) = b0 w(n) + b1w(n 1)


Q w(n) = 0 , n < 0
x(0) = b0 w(0) + b1w(1) = b0 w(0)
x(1) = b0 w(1) + b1w(0)
x(2) = b0 w(2) + b1w(1)
...
The transient signal is x(0) . We should choose {x(1), x(2), L}

26
MATLAB code for generating 50 samples of MA process with b0 = 1, b1 = 2 :
b0=1;
b1=2;
N=50;
w=randn(1,N+1); % generate N+1 white noise samples
for n=1:N
x(n) = b0*w(n+1)+b1*w(n); % shift w by one sample
end

Alternatively, we can use the convolution function in MATLAB:


b0=1;
b1=2;
N=50;
w=randn(1,N+1); % generate N+1 white noise samples
b= [b0 b1]; % b is an vector
y=conv(b,w); % signal length is N+1+2-1
x=y(2:N+1); % remove the transient signals

27
From (1.44), the PSD for MA process is
j 2 j 2
xx () = 1 + 2e 2w = 1 + 2e

It can be plotted using the freqz command in MATLAB:

b0=1;
b1=2;
b= [b0 b1];
a=1;
[H,W] = freqz(b,a); % H is complex frequency response
PSD = abs(H.*H);
plot(W/pi,PSD);

28
9

1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

29
To evaluate the MA process generated by MATLAB, we use (1.38):
1 N 1 2
j n
xx () = lim E x ( n)e
N N n =0

N N 100
E{} average of 100 independent simulations
MATLAB code:
N=100;
b= [1 2]; % b is a vector
for m=1:100 % perform 100 independent runs
w=randn(1,N+1); % generate N+1 white noise samples
y=conv(b,w); % signal length is N+1+2-1
x=y(2:N+1); % remove the transient signals
p(m,:) = abs(fft(x).*fft(x));
end
psd = mean(p)./100;
index = 1/50:1/50:2;
plot(index,psd);
axis([0, 1, 0 10]);

30
31
N N 10000
E{} average of 10000 independent simulations
10

0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

32
Transient signals are also needed to remove in AR & ARMA processes
because of non-stationarity due to the poles:
x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M ) + w(n)
x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M )
+ b0 w(n) + b1 w(n 1) + L + b N w(n N )

e.g., for a firstorder AR process: x(n) = ax(n 1) + w(n)


2( n +1)
m 1 a 2w
R xx (n, n + m) = a
1 a2

nonstationary because R xx (n, n + m) depends on n

for sufficiently large n , say, a 2( n+1) << 1, we can consider it stationary.

since a is the pole, extension to general AR and ARMA processes:

pi 2( n +1) << 1, for all poles { pi }

33
Suppose a 2( n+1) 0.0001 is required and the AR parameter is a = 0.9 .
The required n is calculated as

(0.9) 2( n +1) = 0.9 2( n +1) 0.0001


n 43

MATLAB code for generating 50 samples of the AR process:

M = 43;
N = 50;
a = -0.9;
y(1) = 0;
for n=2:M+N
y(n) = a*y(n-1)+randn;
end
x=y(M+1:M+N);
plot(x);

34
8

-2

-4

-6
0 5 10 15 20 25 30 35 40 45 50

35
Digital Filtering

Given an input signal x(n) and the transfer function H (z ) , it is easy to


generate the corresponding output signal, say, y (n)

For FIR system, we can follow the MA process, while for IIR system, we
can follow the ARMA process. The transient signals can be removed if
necessary as in the MA, AR and/or ARMA processes.

Given H (z ) , the impulse response can be computed via inverse DTFT:

1 j n
h( n) = H ()e d
2

Frequency spectrum for H (z ) impulse response { h(n) }



y ( n) = h( n) x ( n) = h( n k ) x ( k ) = x ( n k ) h( k )
k = k =

36
Example 2.2
Compute the impulse response for H d (z ) with the following DTFT
spectrum, and o = 0.2 and c = 0.4 .

d()
1

0.5

c o 0 o c

1 jn
hd (n) = H
d ( ) e d
2
1 c j n 1 o jn sin( c n) sin( o n)
= 0.5 e d + 0.5 e d = +
2 c 2 o 2n 2n

37
sin(0.2n) + sin(0.4n)
hd (n) = , n = L ,1,0,1, L
2n
Note that hd (0) can be obtained by using LHospitals rule or:
1 j0 1
hd (0) = H d ()e d = H d ()d
2 2
1 c 1 o c + o
= 0.5d + 0.5d = = 0.3
2 c 2 o 2

Combining the results:


0.3, n=0
hd (n) = sin(0.2n) + sin(0.4n)
, otherwise
2n

M
y (n) = hd ( n) x(n) = x( n k )hd (k ) x( n k ) hd (k )
k = k = M

38
Example 2.3
Compute the impulse response of a time-shift function which time-shifts a
signal by a non-integer delay D .

y (n) = x(n D ) Y () = e jD X (), H () = exp( jD)

1 j n 1 jD jn 1 j( n D )
h( n) = H ()e d = e e d = e d
2 2 2
= sinc(n D)
where
sin( x)
sinc( x) =
x

y (n) = x(n) sinc(n D) = x(n k ) sinc(k D)
k =

M
x(n k ) sinc(k D)
k = M

39
Questions for Discussion
1. Observe that the following signal:
10
y (n) = x(n k ) sinc(k D ) x(n D)
k = 10
which depends on future data {x(n + 1), x (n + 2), L , x( n + 10)}.
This is referred to as a non-causal system. How to generate the output
of the non-causal system in practice?

2. The spectrum for the Hilbert transform is

j , 0 <
H () =
j, < 0

Use a FIR filter with 15 coefficients to perform the Hilbert transform of a


discrete-time signal x[n]. Let the resultant signal be y[n] .

40
Chapter 3
Optimal Filter Theory and Applications
References:

B.Widrow and S.D.Stearns, Adaptive Signal Processing, Prentice-Hall,


1985
S.M.Stearns and D.R.Hush Digital Signal Analysis, Prentice-Hall, 1990
P.M.Clarkson, Optimal and Adaptive Signal Processing, CRC Press,
1993
S.Haykin, Adaptive Filter Theory, Prentice-Hall, 2002

1
Optimal Signal Processing is concerned with the design, analysis, and
implementation of processing system that extracts information from
sampled data in a manner that is best or optimal in some sense. Such
processing systems can be referred to as optimal filters.

Basic Classes of Optmal Filtering Applications


1.Prediction: use previous samples to predict current samples

2
Speech Modeling using Linear Predictive Coding (LPC)
Since speech signals are highly correlated, a speech signal s (n) can be
accurately modeled by a linear combination of its past samples:
P
s (n) s(n) = wi s (n i )
i =1
where {wi } are known as the LPC coefficients. Techniques of optimal
signal processing can be used to determine {wi } in an optimal way.

2. Identification

3
System Identification

no(k)
+
Unknown system d(k) +
s(k)
H(z)

r(k)

+ +
+ x(k) Optimal filter -
n i (k)
W(z)
e(k)

Given noisy input x(k ) and/or noisy output r (k ) , our aim is to determine
the impulse response of the unknown system H (z ) using W (z )

4
3. Inverse Filtering: find the inverse of the system

Signal Recovery

Given a noisy discrete-time signal:


x(k ) = s (k ) h(k ) + w(k )
where s (k ) , h(k ) and w(k ) represent the signal of interest, unknown
impulse response and noise, respectively. Optimal signal processing
can be used to recover s (k ) in an optimal way.

5
4. Interference Canceling: Remove noise using an external reference

Interference Cancellation in Electrocardiogram (ECG) Recording


In biomedical engineering, the measured ECG signal r (n) is corrupted
by the 50Hz power line interference:
r ( n) = s ( n) + i ( n)
where s (n) is the noise-free ECG and i (n) represents the 50Hz
interference. An external reference for i (n) is another 50Hz signal.

6
Problem Statement for Optimal Filters

Input Output Desired


y(n) Response
Signal
d(n)
x(n) W(z)
- +

Estimation
Error
e(n)
Given the input x(n) and the desired response d (n) , we want to find the
transfer function W ( z ) or its impulse response such that a statistical
criterion or a cost function is optimized.

7
Some common optimization criteria in the literature are:
N 1
1.Least Squares : find W (z ) that minimizes e 2 (n) where N is the
n =0
number of samples available. This corresponds to least-squares filter
design.
2.Minimum Mean Square Error : find W ( z ) that minimizes E{e 2 (n)}. This
corresponds to Wiener filtering problem.
N 1
3.Least Absolute Sum : find W (z ) that minimizes e(n) .
n=0
4.Minimum Mean Absolute Error : find W (z ) that minimizes E{ e(n) }.
5.Least Mean Fourth : find W (z ) that minimizes E{e 4 (n)}.
The first and second are two commonly used criteria because of their
relatively small computation, ease of analysis and robust performance. In
later sections it is shown that both viewpoints give rise to similar
mathematical expression for W (z ) .
Q.: An absolute optimization criterion does not exist? Why?

8
Least Squares Filtering

For simplicity, we assume W (z ) is a causal FIR filter of length L so that

L 1
W ( z ) = wi z i (3.1)
i =0

The error function e(n) is thus given by

e( n ) = d ( n ) y ( n ) (3.2)
where
L 1
y (n) = wi x(n i ) = W T X (n) ,
i =0
W = [ w0 w1 L wL 2 wL 1 ]T
X (n) = [ x(n) x(n 1) L x( n L + 2) x(n L + 1)]T

9
The cost function is
N 1 N 1 L 1 2

J LS (W ) = e 2 (n) = d (n) wi x(n i ) (3.3)
n=0 n=0 i =0
which is a function of the filter coefficients {wi } and N is the number of
x(n) (and d (n) ).
The minimum of the least squares function can be found by differentiating
J LS (W ) with respect to w0 , w1 , L , wL 1 and then setting the resultant
expressions to zero as follows,

J LS (W ) N 1 L 1
2
= d (n) wi x(n i ) = 0, j = 0,1, L , L 1
w j w j n =0 i =0
N 1 L 1
2 d (n) wi x(n i ) ( x(n j ) ) = 0 (3.4)
n =0 i =0
N 1 N 1 L 1 L 1 N 1
d (n) x(n j ) = wi x(n i ) x(n j ) = wi x(n i ) x(n j )
n =0 n =0 i =0 i =0 n =0

10
Denote
R dx = [ R dx (0) R dx (1) L R dx ( L + 2) R dx ( L + 1)]T (3.5)
where
N 1
R dx ( j ) = d (n) x(n j ), j = 0,1, L , L 1
n =0

and
R xx (0,0) R xx (1,0) L R xx ( L + 2,0) R xx ( L + 1,0)

R xx ( 0 , 1) O
R xx = L O M

R
xx ( 0 , L + 2 ) O
R xx (0, L + 1) R xx (1, L + 1) R xx ( L + 1, L + 1)
where
N 1
R xx (i, j ) = x(n i ) x(n j )
n=0

11
In practice, for stationary signals, we use
R xx (0) R xx (1) L R xx ( L + 2) R xx ( L + 1)

R xx (1) O
R xx = L O M (3.6)

R xx ( L 2) O
R xx ( L 1) R xx ( L 2) R xx (0)
where
N 1 i
R xx (i ) = x(n) x(n + i ) = R xx (i )
n=0

As a result, we have
R dx = R xx W LS
(3.7)
W LS = (R xx ) R dx
1

provided that R xx is nonsingular.

12
Example 3.1

Unknown System
x(n) 4
hi z -i
i=0
+
q(n)
+
d(n) +


4 y(n) e(n)
wi z -i -
i=0

In this example least squares filtering is applied in determining the impulse


response of an unknown system. Assume that the unknown impulse
response is causal and { hi }={1,2,3,2,1}. Given N samples of x(n) and d (n)
where q (n) is a measurement noise.

13
We can use MATLAB to simulate the least squares filter for impulse
response estimation. The MATLAB source code is as follows,
%define the number of samples
N=50;
%define the noise and signal powers
noise_power = 0.0;
signal_power = 5.0;
%define the unknown system impulse response
h=[1 2 3 2 1];
%generate the input signal which is a Gaussian white noise with power 5
x=sqrt(signal_power).*randn(1,N);

14
%generate R_xx
corr_xx=xcorr(x);
for i=0:4
for j=0:4
R_xx(i+1,j+1)= corr_xx(N+i-j);
end
end
%generate the desired output plus noise
d=conv(x,h);
d=d(1:N)+sqrt(noise_power).*randn(1,N);
%generate R_dx
corr_xd = xcorr(d,x);
for i=0:4
R_dx(i+1) = corr_xd(N-i);
end
%compute the estimate channel response
W_ls = inv(R_xx)*(R_dx)'

15
251.6413 24.4044 36.4998 11.8182 4.5115
24.4044 251.6413 24.4044 36.4998 11.8182

R xx = 36.4998 24.4044 251.6413 24.4044 36.4998
11.8182 36.4998 24 .4044 251 . 6413 24 . 4044

4.5115 11.8182 36.4998 24.4044 251.6413
R dx = [390.8245 658.8827 873.3125 616.0732 334.9508]

W LS = [1.1095 2.0151 2.8291 1.8501 0.8176]T

When N is increased to 500, we have


W LS = [0.9975 1.9943 2.9927 1.9942 0.9828]T
When N is increased to 5000, we have
W LS = [1.0000 1.9981 2.9976 1.9972 0.9984]T

16
When N = 500 and the noise power is 0.5 (SNR=10 dB), we have

W LS = [1.0158 1.9826 2.9728 1.9773 0.9925]T

When N = 500 and the noise power is 5.0 (SNR=0 dB), we have

W LS = [1.0900 2.0138 2.9484 2.0249 1.0591]T

It is observed that

1. The estimation accuracy improves as N increases. It is reasonable


because as N increases, the accuracy of R xx and R dx increases due to
more samples are involved in their computation.

2. The estimation accuracy improves as the noise power decreases.

17
Example 3.2

Find the least squares filter of the following one-step predictor system:
s(n)
d(n)
+
z -1 b0+ b1 z-1 -

x(n) e(n)
where
s (n) = 2 sin( 2n / 12)
2(n 1)
x(n) = s (n 1) = 2 sin
12
d ( n) = s ( n)

18
Given d (n) , the aim is to find b0 and b1 in least squares sense.

The MATLAB source code is as follows,

N=50; %define the number of samples


n=0:N-1;
d=sin(2.*pi.*n./12); %generate d(n)
x= d(2:N); %generate x(n) from d(n)
d=d(1:N-1); %keep lengths of x(n) and d(n) equal

corr_xx=xcorr(x,unbiased); %unbiased estimate of correlation


for i=0:1
for j=0:1
R_xx(i+1,j+1)= corr_xx(N-1+i-j);
end
end

19
corr_xd = xcorr(d,x,unbiased);
for i=0:1
R_dx(i+1) = corr_xd(N-1-i);
end
W_ls = inv(R_xx)*(R_dx)'
The result is: W_ls = [1.7705 -1.0440]

N 5000 W_ls = [1.7324 -1.0004]

N 500000 W_ls = [1.7321 -1.0000]

The optimal b0 and b1 can be shown to be [ 3 1]


Q for a real tone:
s (n) = 2 cos() s (n 1) s (n 2)

s (n) = 2 cos(2 / 12) s (n 1) s (n 2)

= 3s (n 1) + (1) s (n 2)

20
Wiener Filtering

The cost function to be minimized is

J MMSE (W ) = E{e 2 (n)} (3.8)

Following the derivations in the least squares filter, the minimum of


J MMSE (W ) is found by

J MMSE (W ) L 1
2
= E d (n) wi x(n i ) = 0, j = 0,1, L , L 1
w j w j i =0
L 1
2 E d (n) wi x(n i ) ( x(n j ) ) = 0 (3.9)
i =0
L 1 L 1
E{d (n) x(n j )} = E wi x(n i ) x(n j ) = wi E{x(n i ) x(n j )}
i =0 i =0

21
Assume d (n) and x(n) are jointly stationary, we have
L 1
Rdx ( j ) = wi R xx (i j ), j = 0,1, L , L 1 (3.10)
i =0
Define
R dx = [ Rdx (0) Rdx ( 1) L Rdx ( L + 2) Rdx ( L + 1)]T (3.11)
and
R xx (0) R xx (1) L R xx ( L 2) R xx ( L 1)
R (1) O R xx ( L 2)
xx
R xx = M O M (3.12)
R ( L 2) O R (1)
xx xx
R xx ( L 1) R xx ( L 2) L R xx (1) R xx (0)
As a result,
R dx = R xx W MMSE
(3.13)
W MMSE = (R xx ) R dx 1

provided that R xx is nonsingular.

22
Relationship between Least Squares Filter & Wiener Filter
When the number of samples N and if ergodicity holds, i.e.,

1 1 N 1
lim { Rdx ( j )} = lim { d (n) x(n j )} = Rdx ( j ) (3.14)
N N N N n =0

and
1 1 N 1
lim { R xx (i, j )} = lim { x(n i ) x(n j )}
N N N N n = 0 (3.15)
= R xx (i j ) = R xx ( j i )

the least squares filter is equivalent to the Wiener filter, i.e.,

W MMSE = W LS (3.16)

23
Properties of the Mean Square Error (MSE) Function
The MSE function E{e 2 (n)}is also known as performance surface and it
can be written in a matrix form:

( )
2
2 L 1 T 2
E{e (n)} = E d (n) wi x(n i ) = E d ( n) W X (n)
i =0

{ } {(
2 T
)} T
= E d (n) 2 E W X ( n) d (n) + E W X ( n) W X (n)

T
( T
) (3.17)


= E {d 2 (n)} 2W T E{( X (n)d (n) )} + W T E {(X ( n) X ( n)T )}W
= E {d 2 (n)} 2W T R dx + W T R xx W

1.The elements of wi in E{e 2 (n)} appear in first degree and second


degree only. This means that E{e 2 (n)}is a quadratic error function and
thus it is unimodal, i.e., there is only a unique (global) minimum and no
local minima exist. (However, it is only true for FIR filter but not true for
IIR filter).

24
An example for L = 2 is shown below:

25
2.The minimum of E{e 2 ( n)} is obtained by substituting W = W MMSE :

{ }
min = E d 2 (n) 2W MMSE T R dx + W MMSE T R xx W MMSE

= E {d (n)} 2(R xx R dx ) R dx + (R xx R dx ) R xx (R xx1 R dx )


2 1 T 1 T
(3.18)
= E {d 2 ( n)} 2 R Tdx ( R xx1 ) T R dx + R Tdx ( R xx1 ) T R dx
= E {d 2 ( n)} R Tdx R xx1 R dx = E {d 2 (n)} R Tdx W MMSE

As a result, the E{e 2 ( n)} can be written as


{ }
E{e 2 (n)} = min min + E d 2 ( n) 2W T R dx + W T R xx W
(3.19)
= min + (W W MMSE ) R xx (W W MMSE )
T

3.When d ( n) is exactly a linear combination of x( n), x(n 1), L , x(n L + 1) ,


L 1
i.e., d (n) = hi x(n i ) , the Wiener solution is wi = hi for i = 0,1, L L 1
i =0
and min =0.

26
Example 3.3

Determine the performance surface and the Wiener filter coefficients of


the following system,

x(n)
z -1

w0 w1
- d(n)
-
+
where
2 2 1 7
E{d (n)} = 42, R xx = , R dx =
1 2 8

27
The performance surface is calculated as
{ }
E{e 2 (n)} = E d 2 (n) 2W T R dx + W T R xx W
w0 2 1 w0
= 42 2[7 8] + [w0 w1 ] w
w1 1 2 1
= 2 w02 + 2 w12 + 2 w0 w1 14 w0 16 w1 + 42
While the Wiener filter weight is
1
2 1 7 2
W MMSE = =
1 2 8 3
Notice that the inverse of any nonsingular two-by-two matrix is
1
a b 1 d b
c d = ad bc c a

In practice, when the E{d 2 (n)} , R xx and R dx are not available, we can
estimate them from x(n) and d (n) using least squares filtering method.
The resultant filter coefficients are least squares filter coefficients.

28
Example 3.4
Find the Wiener filter of the following system:
s(n)
d(n)
+
z -1 b0+ b1 z-1 -

x(n) e(n)
where s (n) = 2 sin( 2n / 12) .
2(n 1) 2n
It can be seen that x(n) = 2 sin and d ( n ) = s ( n ) = 2 sin .
12 12
The required statistics R xx (0) , R xx (1) , Rdx (0) and Rdx (1) are computed as
follows.

29
Using
2 1 cos(2 A)
sin ( A) =
2
and
1
sin( A) sin( B) = (cos( A B ) cos( A + B ) )
2
then
2(n 1) 2(n 1)
R xx (0) = E 2 sin 2 sin
12 12
1 cos(2(2(n 1)) )
= 2E
2
= 1 + E{cos(4(n 1) )} = 1

30
2(n 1) 2n
R xx (1) = E 2 sin 2 sin
12 12
2(n 1) 2n 2(n 1) 2n
= E cos cos +
12 12 12 12
2 3
= cos =
12 2
2(n 1) 2n 3
Rdx (0) = E 2 sin 2 sin =
12 12 2
2 ( n 2 ) 2n 4 1
Rdx (1) = E 2 sin 2 sin = cos =
12 12 12 2
As a result,
1
3 3
b~0 1 2
~ = 2 = 3
1
b1 3 1 1
2 2

31
The performance surface is given by
{ }
E{e 2 (n)} = E d 2 (n) 2W T R dx + W T R xx W
3
3 1 b0 1 2 b0
= 1 2 + [b0 b1 ] b
2 2 b1 3 1 1
2
= b02 + b12 + 3b0 b1 3b0 b1 + 1

Notice that E{d 2 (n)} = E{x 2 (n)} = 1. Moreover, the minimum MSE is
computed as
{ }
min = E d 2 (n) R Tdx W MMSE
3 1 3 3 1
= 1 = 1 + =0
2 2 1 2 2
This means that the optimal predictor is able to shift the phase of the
delayed sine wave and achieve exact cancellation, resulting in min = 0

32
Questions for Discussion
1. A real sinusoid s (n) = A cos(n + ) obeys

s (n) = a1`s (n 1) + a 2 s (n 2)

where a1 = 2 cos() and a 2 = 1. Is s (n) a 2nd order AR process?

2. Can we extend the least squares filter or Wiener filter to the general IIR
system model? Try to answer this question by investigating the Wiener
filter using a simple IIR model:

b0
W ( z) =
1 a1 z 1

That is, given d (k ) and x(k ) . What are the optimal b0 and a1 in mean
square error sense? Assume that x(k ) is white for simplicity.

33
Input Output Desired
y(k) Response
Signal
d(k)
x(k) W(z)
- +

Estimation
Error
e(k)

Steps:
(i) develop e(k )
(ii) compute E{e 2 (k )} in terms of R xx and Rdx only.
(iii) differentiate E{e 2 (k )} w.r.t. b0 and a1

34
3. Suppose you have a ECG signal corrupted by 50Hz interference:

Suggest methods to eliminate/reduce the 50Hz interference.

35
Chapter 4
Adaptive Filter Theory and Applications
References:
B.Widrow and M.E.Hoff, Adaptive switching circuits, Proc. Of
WESCON Conv. Rec., part 4, pp.96-140, 1960
B.Widrow and S.D.Stearns, Adaptive Signal Processing, Prentice-Hall,
1985
O.Macchi, Adaptive Processing: The Least Mean Squares Approach
with Applications in Transmission, Wiley, 1995
P.M.Clarkson, Optimal and Adaptive Signal Processing, CRC Press,
1993
S.Haykin, Adaptive Filter Theory, Prentice-Hall, 2002
D.F.Marshall, W.K.Jenkins and J.J.Murphy, "The use of orthogonal
transforms for improving performance of adaptive filters", IEEE Trans.
Circuits & Systems, vol.36, April 1989, pp.474-483

1
Adaptive Signal Processing is concerned with the design, analysis, and
implementation of systems whose structure changes in response to the
incoming data.

Application areas are similar to those of optimal signal processing but now
the environment is changing, the signals are nonstationary and/or the
parameters to be estimated are time-varying. For example,

Echo cancellation for Hand-Free Telephones (The speech echo is a


nonstationary signal)
Equalization of Data Communication Channels (The channel impulse
response is changing, particularly in mobile communications)
Time-Varying System Identification (the system transfer function to be
estimated is non-stationary in some control applications)

2
Adaptive Filter Development

Year Application Developer(s)

1959 Adaptive pattern recognition Widrow et al


system
1960 Adaptive waveform recognition Jacowatz

1965 Adaptive equalizer for telephone Lucky


channel
1967 Adaptive antenna system Widrow et al
1970 Linear prediction for speech Atal
analysis
Present numerous applications, structures,
algorithms

3
Adaptive Filter Definition
An adaptive filter is a time-variant filter whose coefficients are adjusted in
a way to optimize a cost function or to satisfy some predetermined
optimization criterion.
Characteristics of adaptive filters:
They can automatically adapt (self-optimize) in the face of changing
environments and changing system requirements
They can be trained to perform specific filtering and decision-making
tasks according to some updating equations (training rules)
Why adaptive?
It can automatically operate in
changing environments (e.g. signal detection in wireless channel)
nonstationary signal/noise conditions (e.g. LPC of a speech signal)
time-varying parameter estimation (e.g. position tracking of a moving
source)

4
Block diagram of a typical adaptive filter is shown below:

Adaptive y(k)
x(k) d(k)
Filter - +

{h(k)}

Adaptive e(k)

Algorithm
x(k) : input signal y(k) : filtered output
d(k) : desired response
h(k) : impulse response of adaptive filter
2 N-1 2
The cost function may be E{e (k)} or e (k)
k=0

FIR or IIR adaptive filter


filter can be realized in various structures
adaptive algorithm depends on the optimization criterion

5
Basic Classes of Adaptive Filtering Applications

1.Prediction : signal encoding, linear prediction coding, spectral analysis

6
2.Identification : adaptive control, layered earth modeling, vibration
studies of mechanical system

7
3.Inverse Filtering : adaptive equalization for communication channel,
deconvolution

8
4.Interference Canceling : adaptive noise canceling, echo cancellation

9
Design Considerations

1. Cost Function
choice of cost functions depends on the approach used and the
application of interest
some commonly used cost functions are

mean square error (MSE) criterion : minimizes E{e 2 (k )}


where E denotes expectation operation, e(k ) = d (k ) y (k ) is the
estimation error, d (k ) is the desired response and y (k ) is the actual filter
output

N 1
exponentially weighted least squares criterion : minimizes N 1 k e 2 (k )
k =0
where N is the total number of samples and denotes the exponentially
weighting factor whose value is positive close to 1.

10
2. Algorithm
depends on the cost function used

convergence of the algorithm : Will the coefficients of the adaptive filter


converge to the desired values? Is the algorithm stable? Global
convergence or local convergence?

rate of convergence : This corresponds to the time required for the


algorithm to converge to the optimum least squares/Wiener solution.
misadjustment : excess mean square error (MSE) over the minimum
MSE produced by the Wiener filter, mathematically it is defined as

lim E{e 2 (k )} min


k
M = (4.1)
min

(This is a performance measure for algorithms that use the minimum MSE
criterion)

11
tracking capability : This refers to the ability of the algorithm to track
statistical variations in a nonstationary environment.
computational requirement : number of operations, memory size,
investment required to program the algorithm on a computer.
robustness : This refers to the ability of the algorithm to operate
satisfactorily with ill-conditioned data, e.g. very noisy environment,
change in signal and/or noise models

3. Structure
structure and algorithm are inter-related, choice of structures is based on
quantization errors, ease of implementation, computational complexity,
etc.
four commonly used structures are direct form, cascade form, parallel
form, and lattice structure. Advantages of lattice structures include
simple test for filter stability, modular structure and low sensitivity to
quantization effects.

12
e.g.,
B2 z 2 C1 z C2 z D1 z + E1 D2 z + E 2
H ( z) = = = +
z + A1 z + A0 ( z p1 ) ( z p 2 )
2 z p1 z p2

Q. Can you see an advantage of using cascade or parallel form?

13
Commonly Used Methods for Minimizing MSE
For simplicity, it is assumed that the adaptive filter is of causal FIR type
and is implemented in direct form. Therefore, its system block diagram is

x(n) z -1 z -1 ... z -1

w0(n) w1(n) wL-2(n) wL-1(n)

+ ... + +
y(n) -
d(n)
e(n) +
+

14
The error signal at time n is given by
e( n ) = d ( n ) y ( n ) (4.2)
where
L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0
W (n) = [ w0 (n) w1 (n) L wL 2 (n) wL 1 (n)]T
X (n) = [ x(n) x(n 1) L x(n L + 2) x(n L + 1)]T

Recall that minimizing the E{e 2 (n)}will give the Wiener solution in optimal
filtering, it is desired that
lim W ( n) = W MMSE = (R xx )1 R dx (4.3)
n

In adaptive filtering, the Wiener solution is found through an iterative


procedure,
W (n + 1) = W ( n) + W (n) (4.4)
where W (n) is an incrementing vector.

15
Two common gradient searching approaches for obtaining the Wiener
filter are
1. Newton Method
1 E{e (n)}
2
W (n) = R xx
(4.5)
W ( n )

where is called the step size. It is a positive number that controls the
convergence rate and stability of the algorithm. The adaptive algorithm
becomes

E{e 2 (n)}
W (n + 1) = W (n) R xx1
W (n)
= W (n) R xx1 2(R xx W (n) R dx ) (4.6)
= (1 2)W (n) + 2 R xx1 R dx
= (1 2)W (n) + 2W MMSE

16
Solving the equation, we have

W (n) = W MMSE + (1 2) n (W (0) W MMSE ) (4.7)

where W (0) is the initial value of W (n) . To ensure

lim W (n) = W MMSE (4.8)


n
the choice of should be

1 <| 1 2 |< 1 0 < < 1 (4.9)

In particular, when = 0.5 , we have

W (1) = W MMSE + (1 2 0.5)1 (W (0) W MMSE ) = W MMSE (4.10)

The weights jump form any initial W (0) to the optimum setting W MMSE in a
single step.

17
An example of the Newton method with = 0.5 and 2 weights is illustrated
below.

18
2. Steepest Descent Method

E{e 2 (n)}
W (n) =
(4.11)
W ( n )
Thus
E{e 2 (n)}
W (n + 1) = W (n)
W (n)
= W (n) 2(R xx W (n) R dx ) (4.12)
= ( I 2 R xx )W (n) + 2 R xx W MMSE
= ( I 2 R xx )(W (n) W MMSE ) + W MMSE

where I is the L x L identity matrix. Denote


V (n) = W (n) W MMSE (4.13)
We have
V (n + 1) = ( I 2 R xx )V (n) (4.14)

19
Using the fact that R xx is symmetric and real, it can be shown that

R xx = Q Q 1 = Q Q T (4.15)

where the modal matrix Q is orthonormal. The columns of Q , which are


the L eigenvectors of R xx , are mutually orthogonal and normalized. Notice
that Q 1 = Q T . While is the so-called spectral matrix and all its elements
are zero except for the main diagonal, whose elements are the set of
eigenvalues of R xx , 1 , 2 , L , L . It has the form

1 0 L 0
0
2
=M O M (4.16)
0

0 L 0 L

20
It can be proved that the eigenvalues of R xx are all real and greater or
equal to zero. Using these results and let
V ( n) = Q U ( n) (4.17)
We have
Q U (n + 1) = ( I 2 R xx )Q U (n)
U (n + 1) = Q 1 ( I 2 R xx )Q U (n)
(4.18)
= Q( 1
I Q 2Q 1
)
R xx Q U (n)
= ( I 2 )U (n)
The solution is
U (n) = ( I 2 )n U (0) (4.19)
where U (0) is the initial value of U (n) . Thus the steepest descent
algorithm is stable and convergent if

lim ( I 2 )n = 0
n

21
or
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0 (4.20)
0

0 L 0 lim (1 2 L )
n
n
which implies
1
| 1 2 max |< 1 0 < < (4.21)
max
where max is the largest eigenvalue of R xx .
If this condition is satisfied, it follows that
lim U (n) = 0 lim Q 1 V (n) = lim Q 1 (W (n) W MMSE ) 0
n n n
(4.22)
lim W (n) = W MMSE
n

22
An illustration of the steepest descent method with two weights and
= 0.3 is given as below.

23
Remarks:

Steepest descent method is simpler than the Newton method since no


matrix inversion is required.
The convergence rate of Newton method is much faster than that of the
steepest descent method.
When the performance surface is unimodal, W (0) can be arbitrarily
chosen. If it is multimodal, good initial values of W (0) is necessary in
order for global minimization.
However, both methods require exact values of R xx and R dx which are
not commonly available in practical applications.

24
Widrows Least Mean Square (LMS) Algorithm

A. Optimization Criterion

To minimize the mean square error E{e 2 (n)}

B. Adaptation Procedure

It is an approximation of the steepest descent method where the


expectation operator is ignored, i.e.,

E{e 2 (n)} e 2 (n)


is replaced by
W (n) W (n)

25
The LMS algorithm is therefore:

e 2 (n)
W (n + 1) = W (n)
W (n)
e 2 (n) e(n)
= W ( n)
e(n) W (n)

= W (n) 2e(n)
[
d ( n) W T ( n) X ( n)],
AT B
=B
W (n) A
= W (n) + 2e(n) X (n)
or

wi (n + 1) = wi (n) + 2e(n) x(n i ), i = 0,1, L , L 1 (4.23)

26
C. Advantages

low computational complexity


simple to implement
allow real-time operation
does not need statistics of signals, i.e., R xx and R dx

D. Performance Surface

The mean square error function or performance surface is identical to that


in the Wiener filtering:

E{e 2 (n)} = min + (W (n) W MMSE )T R xx (W (n) W MMSE ) (4.24)

where W (n) is the adaptive filter coefficient vector at time n .

27
E. Performance Analysis
Two important performance measures in LMS algorithms are rate of
convergence & misadjustment (relates to steady state filter weight
variance).
1. Convergence Analysis
For ease of analysis, it is assumed that W (n) is independent of X (n) .
Taking expectation on both sides of the LMS algorithm, we have

E{W (n + 1)} = E{W (n)} + 2E{e(n) X ( n)}


{ }
= E{W ( n)} + 2E d (n) X (n) X (n) ( X T (n)W (n))
(4.25)
= E{W ( n)} + 2 R dx 2 R xx E{W (n)}
= (I 2 R xx )E{W (n)} + 2 R xx W MMSE

which is very similar to the adaptive equation (4.12) in the steepest


descent method.

28
Following the previous derivation, W (n) will converge to the Wiener filter
weights in the mean sense if
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0
0

0 L 0 lim (1 2 L )
n
n
| 1 2 i |< 1, i = 1,2, L , L
1
0 < < (4.26)
max
Define geometric ratio of the p th term as
r p = 1 2 p , p = 1,2, L , L (4.27)
It is observed that each term in the main diagonal forms a geometric
series {1, r p1 , r p2 , L , r pn 1 , r pn , r pn +1 , L}.

29
Exponential function can be fitted to approximate each geometric series:
1 n
r p exp

{ }
r p exp
n



(4.28)
p
p
where p is called the p th time constant .
For slow adaptation, i.e., 2 p << 1, p is approximated as

1 ( 2 p ) ( 2 p )
2 3
= ln(1 2 p ) = 2 p + L 2 p
p 2 3
(4.29)
1
p
2 p
Notice that the smaller the time constant the faster the convergence rate.
Moreover, the overall convergence is limited by the slowest mode of
convergence which in turns stems from the smallest eigenvalue of R xx ,
min .

30
That is,
1
max (4.30)
2 min

In general, the rate of convergence depends on two factors:

the step size : the larger the , the faster the convergence rate
the eigenvalue spread of R xx , ( R xx ) : the smaller ( R xx ) , the faster the
convergence rate. ( R xx ) is defined as
max
( R xx ) = (4.31)
min

Notice that 1 ( R xx ) < . It is worthy to note that although ( R xx )


cannot be changed, the rate of convergence will be increased if we
transform x(n) to another sequence, say, y (n) , such that ( R yy ) is close
to 1.

31
Example 4.1
An Illustration of eigenvalue spread for LMS algorithm is shown as follows.
Unknown System
x(n) 1
hi z -i
i=0
+
q(n)
+
d(n) +


1 y(n) e(n)
wi z -i -
i=0

d(n) = h0x(n) + h1x(n-1) + q(n)


y(n) = w0(n)x(n) + w1(n)x(n-1)
e(n) = d(n) y(n) = d(n) w0(n)x(n) w1(n)x(n-1)

w0(n+1) = w0(n) + 2ue(n)x(n)


w1(n+1) = w1(n) + 2ue(n)x(n-1)

32
; file name is es.m
clear all
N=1000; % number of sample is 1000
np = 0.01; % noise power is 0.01
sp = 1; % signal power is 1 which implies SNR = 20dB
h=[1 2]; % unknown impulse response
x = sqrt(sp).*randn(1,N);
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);

w0(1) = 0; % initial filter weights are 0


w1(1) = 0;

mu = 0.005; % step size is fixed at 0.005

y(1) = w0(1)*x(1); % iteration at n=0


e(1) = d(1) - y(1); % separate because x(0) is not defined
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);

33
for n=2:N % the LMS algorithm
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end

n = 1:N+1;
subplot(2,1,1)
plot(n,w0) % plot filter weight estimate versus time
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e); % plot square error versus time

34
35
36
Note that both filter weights converge at similar speed because the
eigenvalues of the R xx are identical:

Recall
R xx (0) R xx (1)
R xx =
R
xx (1) R xx ( 0)

For white process with unity power, we have

R xx (0) = E{x(n).x(n)} = 1
R xx (1) = E{x(n).x(n 1)} = 0

As a result,
Rxx (0) Rxx (1) 1 0
R xx = =
R
xx (1) R xx ( 0) 0 1

( R xx ) = 1

37
; file name is es1.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/2).*randn(1,N+1);
x = u(1:N) + u(2:N+1); % x(n) is now a MA process with power 1
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);

w0(1) = 0;
w1(1) = 0;

mu = 0.005;

y(1) = w0(1)*x(1);
e(1) = d(1) - y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);

38
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end

n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);

39
40
41
Note that the convergence speed of w0 (n) is slower than that of w1 (n)

Investigating the R xx :

R xx (0) = E{x(n).x(n)}
= E{(u (n) + u (n 1)) (u (n) + u (n 1))}
= E{u 2 (n) + u 2 (n 1)}
= 0.5 + 0.5 = 1
Rxx (1) = E{x(n).x(n 1)}
= E{(u (n) + u (n 1)) (u (n 1) + u (n 2))}
= E{u 2 (n 1)} = 0.5
As a result,
Rxx (0) Rxx (1) 1 0.5
Rxx = =
R xx (1) Rxx (0) 0.5 1

min = 0.5 and max = 1.5 ( R xx ) = 3 (MATLAB command: eig)

42
; file name is es2.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/5).*randn(1,N+4);
x = u(1:N) + u(2:N+1) + u(3:N+2) + u(4:N+3) +u(5:N+4); % x(n) is 5th order MA process
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);

w0(1) = 0;
w1(1) = 0;

mu = 0.005;

y(1) = w0(1)*x(1);
e(1) = d(1) - y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);

43
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end

n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.5])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.5])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);

44
45
46
We see that the convergence speeds of both weights are very slow,
although that of w1 (n) is faster.

Investigating the R xx :

Rxx (0) = E{x(n).x(n)}


= E{u 2 (n) + u 2 (n 1) + u 2 (n 2) + u 2 (n 3) + u 2 (n 4)}
= 0.2 + 0.2 + 0.2 + 0.2 + 0.2 = 1
Rxx (1) = E{x(n).x(n 1)}}
= E{u 2 (n 1)} + E{u 2 (n 2)} + E{u 2 (n 3)} + E{u 2 (n 4)}
= 0.8
As a result,
Rxx (0) Rxx (1) 1 0.8
Rxx = =
R xx (1) Rxx (0) 0.8 1

min = 0.2 and max = 1.8 ( R xx ) = 9

47
2. Misadjustment

Upon convergence, if lim W (n) = W MMSE , then the minimum MSE will be
n
equal to

{ }
min = E d 2 (n) R Tdx W MMSE (4.32)

However, this will not occur in practice due to random noise in the weight
vector W (n) . Notice that we have lim E{W (n)} = W MMSE but not
n
lim W (n) = W MMSE . The MSE of the LMS algorithm is computed as
n

2 ( T 2
E{e (n)} = E d (n) W (n) X (n)

)
{ ( ) }
= min + E (W (n) W MMSE )T X (n) X (n)T (W (n) W MMSE ) (4.33)
= min + E {(W (n) W MMSE )T
R xx (W (n) W MMSE )}

48
The second term of the right hand side at n is known as the excess
MSE and it is given by

{
excess MSE = lim E (W ( n) W MMSE )T R xx (W (n) W MMSE )
n
}
= lim E {V (n) T
R xx V (n) }
n

= lim E {U (n) }
T (4.34)
U ( n)
n
L 1
= min i = min tr [R xx ]
i =0

where tr [R xx ] is the trace of R xx which is equal to the sum of all elements


of the principle diagonal:

tr [R xx ] = L R xx (0) = L E{x 2 (n)} (4.35)

49
As a result, the misadjustment M is given by

lim E{e 2 (k )} min


M = k
min
min L E{x 2 (n)}
= (4.36)
min
= L E{x 2 (n)}
which is proportional to the step size, filter length and signal power.
Remarks:
1.There is a tradeoff between fast convergence rate and small mean
square error or misadjustment. When increases, both the convergence
rate and M increase; if decreases, both the convergence rate and M
decrease.

50
2.The bound for is
1
0<< (4.37)
max
In practice, the signal power of x(n) can generally be estimated more
easily than the eigenvalue of R xx . We also note that
L
max i = tr [R xx ] = L E{x 2 (n)} (4.38)
i =1

A more restrictive bound for which is much easier to apply thus is


1
0<< 2
(4.39)
L E{x (n)}
Moreover, instead of a fixed value of , we can make it time-varying as
(n). A design idea of a good (n) is
large value initially, to ensure fast inital convergence rate
(n) =
small value finally, to ensure small misadjustment upon convergence

51
LMS Variants
1. Normalized LMS (NLMS) algorithm
the product vector e(n) X (n) is modified with respect to the squared
Euclidean norm of the tap-input vector X (n) :
2
W (n + 1) = W (n) + T
e( n ) X ( n ) (4.40)
c + X ( n) X ( n)
where c is a small positive constant to avoid division by zero.
can also be considered as an LMS algorithm with a time-varying step
size:

(n ) = T
(4.41)
c + X ( n) X ( n)
substituting c = 0 , it can be shown that the NLMS algorithm converges if
0 < < 0.5 selection of step size in the NLMS is much easier than
that of LMS algorithm

52
2. Sign algorithms

pilot LMS or signed error or signed algorithm:

W (n + 1) = W (n) + 2 sgn[e(n)] X (n) (4.42)

clipped LMS or signed regressor:

W (n + 1) = W (n) + 2e(n) sgn[ X (n)] (4.43)

zero-forcing LMS or sign-sign:

W (n + 1) = W (n) + 2 sgn[e(n)] sgn[ X (n)] (4.44)

their computational complexity is simpler than the LMS algorithm but


they are relatively difficult to analyze

53
3. Leaky LMS algorithm
the LMS update is modified by the presence of a constant leakage factor
:

W (n + 1) = W (n) + 2e(n) X (n) (4.45)

where 0 < < 1.

operates when R xx has zero eigenvalues.

4. Least mean fourth algorithm


instead of minimizing E{e 2 (n)} , E{e 4 (n)} is minimized based on LMS
approach:
W (n + 1) = W (n) + 4e 3 (n) X (n) (4.46)
can outperform LMS algorithm in non-Gaussian signal and noise
conditions

54
Application Examples
Example 4.2
1. Linear Prediction
Suppose a signal x(n) is a second-order autoregressive (AR) process that
satisfies the following difference equation:
x( n) = 1.558 x(n 1) 0.81x(n 2) + v(n)
where v(n) is a white noise process such that

v2 , m=0
Rvv (m) = E{v(n)v(n + m)} =
0, otherwise

We want to use a two-coefficient LMS filter to predict x(n) by

2
x (n) = wi (n) x(n i ) = w1 (n) x(n 1) + w2 (n) x(n 2)
i =1

55
Upon convergence, we desire
E{w1 (n)} 1.558
and
E{w2 (n)} 0.81

x(n) z -1 z -1

w1(n) w2(n)

+
+
+
x(n) -
d(n)
+ e(n)
+

56
The error function or prediction error e(n) is given by
2
e(n) = d (n) wi (n) x(n i )
i =1
= x(n) w1 (n) x(n 1) w2 (n) x(n 2)
Thus the LMS algorithm for this problem is
e 2 (n) e 2 (n) e(n)
w1 (n + 1) = w1 (n) = w1 (n)
2 w1 (n) 2 e(n) w1 (n)
= w1 (n) + e(n) x(n 1)
and
e 2 (n)
w2 (n + 1) = w2 (n)
2 w2 (n)
= w2 (n) + e(n) x(n 2)
The computational requirement for each sampling interval is
multiplications : 5
addition/subtraction : 4

57
Two values of , 0.02 and 0.004, are investigated:

Convergence characteristics for the LMS predictor with = 0.02

58
Convergence characteristics for the LMS predictor with = 0.004

59
Observations:

1.When = 0.02 , we had a fast convergence rate (the parameters


converged to the desired values in approximately 200 iterations) but
large fluctuation existed in w1 (n) and w2 (n) .
2.When = 0.004 , small fluctuation in w1 (n) and w2 (n) but the filter
coefficients did not converge to the desired values of 1.558 and -0.81
respectively after the 300th iteration.
3.The learning behaviours of w1 (n) and w2 (n) agreed with those of
E{w1 (n)} and E{w2 (n)} . Notice that E{w1 (n)} and E{w2 (n)} can be
derived by taking expectation on the LMS algorithm.

60
Example 4.3

2. System Identification

Given the input signal x(n) and output signal d (n) , we can estimate the
impulse response of the system or plant using the LMS algorithm.

2
Suppose the transfer function of the plant is hi z i which is a causal FIR
i =0
unknown system, then d (n) can be represented as

2
d (n) = hi x(n i )
i =0

61
Assuming that the order the transfer function is unknown and we use a 2-
coefficient LMS filter to model the system function as follows,
Plant
x(n) 2 d(n)
hi z -i
i=0
+
e(n)

-
1 y(n)
wi z -i
i=0

The error function is computed as


1
e(n) = d (n) y (n) = d (n) wi (n) x(n i )
i =0
= d (n) w0 (n) x(n) w1 (n) x(n 1)

62
Thus the LMS algorithm for this problem is

e 2 (n) e 2 (n) e(n)


w0 (n + 1) = w0 (n) = w0 (n)
2 w0 (n) 2 e(n) w0 (n)
= w0 (n) + e(n) x(n)
and
e 2 (n)
w1 (n + 1) = w1 (n)
2 w1 (n)
= w1 (n) + e(n) x(n 1)

The learning behaviours of the filter weights w0 (n) and w1 (n) can be
obtained by taking expectation on the LMS algorithm. To simplify the
analysis, we assume that x(n) is a stationary white noise process such
that
2x , m=0
R xx (m) = E{x(n) x(n + m)} =
0, otherwise

63
Assume the filter weights are independent of x(n) and apply expectation
on the first updating rule gives

E{w0 (n + 1)} E{w0 (n)}


= E{e(n) x(n)}
= E{(d (n) y (n) )x(n)}
2 1
= E hi x(n i ) wi (n) x(n i ) x(n)
i =0 i =0
= E{(h0 x(n) + h1 x(n 1) + h2 x(n 2) w0 (n) x(n) w1 (n) x(n 1) )x(n)}
= h0 E{x 2 (n)} + h1 E{x(n 1) x(n)} + h2 E{x(n 2) x(n)}
E{w0 (n)}E{x 2 (n)} E{w1 (n)}E{x(n 1) x(n)}
= h0 2x E{w0 (n)} 2x

64
E{w0 (n + 1)} = E{w0 (n)}(1 2x ) + h0 2x
E{w0 (n)} = E{w0 (n 1)}(1 2x ) + h0 2x
E{w0 (n 1)} = E{w0 (n 2)}(1 2x ) + h0 2x
LLLLLLLLLLLLLLLLLLLL
E{w0 (1)} = E{w0 (0)}(1 2x ) + h0 2x
Multiplying the second equation by (1 2x ) on both sides, the third
equation by (1 2x ) 2 , etc., and summing all the resultant equations, we
have
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x (1 + (1 2x ) + L + (1 2x ) n )
1 (1 2x ) n +1
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x
1 (1 2x )
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 (1 (1 2x ) n +1 )
E{w0 (n + 1)} = ( E{w0 (0)} h0 )(1 2x ) n +1 + h0

65
Hence
lim E{w0 (n)} = h0
n
provided that
2
| 1 2x |< 1 1 < 1 2x <1 0 < <
2x
Similarly, we can show that the expected value of E{w1 (n)} is

E{w1 (n)} = ( E{w1 (0)} h1 )(1 2x ) n + h1


provided that
2
| 1 2x |< 1 1 < 1 2x < 1 0 < <
2x

It is worthy to note that the choice of the initial filter weights E{w0 (0)} and
E{w1 (0)} do not affect the convergence of the LMS algorithm because the
performance surface is unimodal.

66
Discussion:

Since the LMS filter consists of two weights but the actual transfer function
comprises three coefficients. The plant cannot be exactly modeled in this
case. This refers to under-modeling. If we use a 3-weight LMS filter with
2
transfer function wi z i , then the plant can be modeled exactly. If we use
i =0
more than 3 coefficients in the LMS filter, we still estimate the transfer
function accurately. However, in this case, the misadjustment will be
increased with the filter length used.

Notice that we can also use the Wiener filter to find the impulse response
of the plant if the signal statistics, R xx (0) , R xx (1) , Rdx (0) and Rdx (1) are
available. However, we do not have Rdx (0) and Rdx (1) although
R xx (0) = 2x and R xx (1) =0 are known. Therefore, the LMS adaptive filter can
be considered as an adaptive realization of the Wiener filter and it is used
when the signal statistics are not (completely) known.

67
Example 4.4
3. Interference Cancellation
Given a received signal r (k ) which consist of a source signal s (k ) and a
sinusoidal interference with known frequency. The task is to extract s (k )
from r (k ) . Notice that the amplitude and phase of the sinusoid is unknown.
A well-known application is to remove 50/60 Hz power line interference in
the recording of the electrocardiogram (ECG).
Source Signal + Sinusoidal Interference

r(k)=s(k) + Acos( 0k + )

Reference Signal +
sin(0k) - e(k)
b0
-
cos(0k)
90 0 Phase-Shift b1

68
The interference cancellation system consists of a 90 0 phase-shifter and a
two-weight adaptive filter. By properly adjusting the weights, the reference
waveform can be changed in magnitude and phase in any way to model
the interfering sinusoid. The filtered output is of the form

e(k ) = r (k ) b0 (k ) sin( 0 k ) b1 (k ) cos( 0 k )

The LMS algorithm is

e 2 (k ) e 2 (k ) e(k )
b0 (k + 1) = b0 (k ) = b0 (k )
2 b0 (k ) 2 e(k ) b0 (k )
= w0 (k ) + e(k ) sin( 0 k )
and
e 2 (k )
b1 (k + 1) = b1 (k )
2 b1 (k )
= b1 (k ) + e(k ) cos( 0 k )

69
Taking the expected value of b0 (k ) , we have

E{b0 (k + 1)}
= E{b0 (k )} + E{e(k ) sin( 0 k )}
= E{b0 (k )} + E{[ s (k ) + A cos( 0 k + ) b0 (k ) sin( 0 k ) b1 (k ) cos( 0 k )] sin( 0 k )}
= E{b0 (k )} + E{[ A cos( 0 k ) cos() A sin( 0 k ) sin() b0 (k ) sin( 0 k )
b1 (k ) cos( 0 k )] sin( 0 k )}
1
2

{
= E{b0 (k )} + E ( A cos() b1 (k ) ) sin( 2 0 k ) E ( A sin() + b0 (k ) ) sin 2 ( 0 k )

}
1 cos(2 0 k )
= E{b0 (k )} E ( A sin() + b0 (k ) )
2

= E{b0 (k )} A sin() E{b0 (k )}
2 2
A
= 1 E{b0 (k )} sin()
2 2

70
Following the derivation in Example 4.3, provided that 0 < < 4 , the
learning curve of E{b0 (k )} can be obtained as
k

E{b0 (k )} = A sin() + ( E{b0 (0)} + A sin() ) 1
2

Similarly, E{b1 (k )} is calculated as

k

( )
E{b1 (k )} = A cos() + E{b1 (0)} A cos() 1
2
When k , we have

lim E{b0 (k )} = A sin()


k
and

lim E{b1 (k )} = A cos()


k

71
The filtered output is then approximated as
e(k ) r (k ) + A sin() sin( 0 k ) A cos() cos( 0 k )
= s(k )
which means that s (k ) can be recovered accurately upon convergence.
Suppose E{b0 (0)} = E{b1 (0)} = 0 , = 0.02 and we want to find the number
of iterations required for E{b1 (k )} to reach 90% of its steady state value.
Let the required number of iterations be k 0 and it can be calculated from
k0

E{b1 (k )} = 0.9 A cos() = A cos() A cos()1
2
k
0.02 0
1 = 0.1
2
log(0.1)
k0 = = 229.1
log(0.99)
Hence 300 iterations are required.

72
If we use Wiener filter with filter weights b0 and b1, the mean square error
function can be computed as

( )
2
1 A
E{e 2 (k )} = b02 + b12 + A sin() b0 A cos() b1 + E{s 2 (k )} +
2 2

The Wiener coefficients are found by differentiating E{e 2 (k )} with respect


to b0 and b1 and then set the resultant expression to zeros. We have

E{e 2 (k )} ~
= b0 + A sin() = 0 b0 = A sin()
b0
and

E{e 2 (k )} ~
= b1 A cos() = 0 b1 = A cos()
b1

73
Example 4.5

4. Time Delay Estimation

Estimation of the time delay between two measured signals is a problem


which occurs in a diverse range of applications including radar, sonar,
geophysics and biomedical signal analysis. A simple model for the
received signals is
r1 (k ) = s (k ) + n1 (k )
r2 (k ) = s (k D) + n1 (k )
where s (k ) is the signal of interest while n1 (k ) and n 2 (k ) are additive
noises. The is the attenuation and D is the time delay to be determined.
In general, D is not an integral multiple of the sampling period.

Suppose the sampling period is 1 second and s (k ) is bandlimited between


-0.5 Hz and 0.5 Hz (- rad/s and rad/s). We can derive the system
which can produce a delay of D as follows.

74
Taking the Fourier transform of s D (k ) = s (k D) yields

S () = e jD S ()

This means that a system of transfer function e jD can generate a delay


of D for s (k ) . Using the inverse DTFT formula of (I.9), the impulse
response of e jD is calculated as

1 jnD jn
h( n) = e e d
2
1 j ( n D )
= e d
2
= sinc(n D)
where
sin( v)
sinc(v) =
v

75
As a result, s (k D) can be represented as
s ( k D ) = s ( k ) h( k )

= s (k i )sinc(i D )
i =
P
s (k i )sinc(i D )
i = P
for sufficiently large P .
This means that we can use a non-casual FIR filter to model the time
delay and it has the form:
P
W ( z ) = wi z i
i = P
It can be shown that wi sinc(i D ) for i = P, P + 1, L , P using the
minimum mean square error approach. The time delay can be estimated
from {wi } using the following interpolation:
P
D = arg max wi sinc(i t )
t i = P

76
P
r1(k) W(z) = wi z -i i

i=-P


e(k)
+

r2(k)

P
e(k ) = r2 (k ) r1 (k i ) wi (k )
i = P

77
The LMS algorithm for the time delay estimation problem is thus

e 2 (k )
w j (k + 1) = w j (k )
w j (k )
e 2 (k ) e(k )
= w j (k )
e(k ) w j (k )
= w j (k ) + 2e(k )r1 (k j ), j = P, P + 1, L , P

The time delay estimate at time k is:

P
D (k ) = arg max wi (k)sinc(i t )
t i = P

78
Exponentially Weighted Recursive Least-Squares

A. Optimization Criterion
n
To minimize the weighted sum of squares J (n) = n l e 2 (l ) for each time
l =0
n where is a weighting factor such that 0 < 1.
When = 1, the optimization criterion is identical to that of least squaring
filtering and this value of should not be used in a changing environment
because all squared errors (current value and past values) have the same
weighting factor of 1.
To smooth out the effect of the old samples, should be chosen less than
1 for operating in nonstationary conditions.

B. Derivation
Assume FIR filter for simplicity. Following the derivation of the least
squares filter, we differentiate J (n) with respect to the filter weight vector
at time n , i.e., W (n) , and then set the L resultant equations to zero.

79
By so doing, we have
R ( n) W ( n) = G ( n) (4.47)
where
n
R(n) = n l X (l ) X (l )T
l =0

n
G (n) = n l d (l ) X (l )
l =0

X (l ) = [ x(l ) x(l 1) L x(l L + 2) x(l L + 1)]T


Notice that R (n) and G (n) can be computed recursively from

n 1
R(n) = n l X (l ) X (l )T + X (n) X (n)T = R(n 1) + X (n) X (n)T (4.48)
l =0

n 1
G (n) = n l d (l ) X (l ) + d (n) X (n) = G (n 1) + d (n) X (n) (4.49)
l =0

80
Using the well-known matrix inversion lemma:

If
A = B + C CT (4.50)

where A and B are N N matrix and C is a vector of length N , then

A 1 = B 1 B 1 C (1 + C T B 1 C ) 1 C T B 1 (4.51)

Thus R (n) 1 can be written as

1 1
1 R ( n 1) X ( n ) X ( n ) T
R ( n 1)
R ( n ) 1 = R (n 1) 1 (4.52)
T
+ X (n) R (n 1) X (n) 1

81
The filter weight W (n) is calculated as

W (n) = R (n) 1 G (n)


1 R (n 1) 1 X (n) X (n) T R (n 1) 1
[G (n 1) + d (n) X (n)]
1
= R (n 1)
T 1
+ X (n) R (n 1) X (n)
1
= R (n 1) 1 G (n 1) + d (n) R (n 1) 1 X (n)

R (n 1) 1 X (n) X (n) T R (n 1) 1 G (n 1) d (n) R (n 1) 1 X (n) X (n) T R (n 1) 1 X (n)
( + X (n) T R (n 1) 1 X (n))
d (n) R (n 1) 1 X (n) + d (n) R (n 1) 1 X (n) X (n) T R (n 1) 1 X (n)
= W (n 1) + T 1

( + X (n) R (n 1) X (n))
R (n 1) 1 X (n) X (n) T W (n 1) d (n) R (n 1) 1 X (n) X (n) T R (n 1) 1 X (n)
( + X (n) T R (n 1) 1 X (n))
(d (n) X (n) T W (n 1)) R (n 1) 1 X (n)
= W (n 1) +
+ X (n) T R (n 1) 1 X (n)

82
As a result, the exponentially weighted recursive least squares (RLS)
algorithm is summarized as follows,
1. Initialize W (0) and R (0) 1
2. For n = 1,2, L, compute

e(n) = d (n) X (n)T W (n 1) (4.53)

1
( n) = (4.54)
T 1
+ X (n) R (n 1) X ( n)

W (n) = W (n 1) + (n)e(n) R (n 1) 1 X (n) (4.55)

R ( n ) 1 =
1

[
R (n 1) 1 (n) R (n 1) 1 X (n) X (n)T R (n 1) 1 ] (4.56)

83
Remarks:
1.When = 1, the algorithm reduces to the standard RLS algorithm that
n
minimizes e 2 (l ) .
l =0
2.For nonstationary data, 0.95 < < 0.9995 has been suggested.
3.Simple choices of W (0) and R (0) 1 are 0 and 2 I , respectively, where
2 is a small positive constant.

C. Comparison with the LMS algorithm

1. Computational Complexity

RLS is more computationally expensive than the LMS. Assume there are
L filter taps, LMS requires ( 4 L + 1) additions and ( 4 L + 3 ) multiplications
per update while the exponentially weighted RLS needs a total of
( 3L2 + L 1) additions/subtractions and ( 4 L2 + 4 L ) multiplications/divisions.

84
2. Rate of Convergence
RLS provides a faster convergence speed than the LMS because
RLS is an approximation of the Newton method while LMS is an
approximation of the steepest descent method.
the pre-multiplication of R (n) 1 in the RLS algorithm makes the resultant
eigenvalue spread becomes unity.
Improvement of LMS algorithm with the use of Orthogonal Transform

A. Motivation
When the input signal is white, the eigenvalue spread has a minimum
value of 1. In this case, the LMS algorithm can provide optimum rate of
convergence.

However, many practical signals are nonwhite, how can we improve the
rate of convergence using the LMS algorithm?

85
B. Idea
To transform the input x(n) to another signal v(n) so that the modified
eigenvalue spread is 1. Two steps are involved:
1. Transform x(n) to v(n) using an N N orthogonal transform T so that
12 0 L 0

0 22 M
R vv = T R xx T T = M 0 O

2N 1 0
0 L 0 2N
where
V ( n) = T X ( n)

V ( n) = [v1 (n) v 2 ( n) L v N 1 (n) v N (n)]T

X ( n) = [ x(n) x(n 1) L x( n N + 2) x(n N + 1)]T

86
{
R xx = E X (n) X (n)T }
{
R vv = E V (n)V (n)T }
2. Modify the eigenvalues of R vv so that the resultant matrix has identical
eigenvalues:

2 0 L 0

power normalizatoin 0 2 M
R vv R ' vv = M O

2 0
0 L 0 2

87
Block diagram of the transform domain adaptive filter

88
C. Algorithm
The modified LMS algorithm is given by

W (n + 1) = W (n) + 2e( n) 2 V ( n)
where
1 / 12 0 L 0
2
0 1 / 2 M
2 = M O
2
1 / N 1 0
0 L 0 1 / 2N

e( n ) = d ( n ) y ( n )

y ( n) = W ( n) T V ( n) = W ( n) T T X ( n)

W (n) = [ w1 (n) w2 (n) L w N 1 (n) w N (n)]T

89
Writing in scalar form, we have

2e(n)vi (n)
wi (n + 1) = wi (n) + , i = 1,2, L , N
i2

Since i2 is the power of vi (n) and it is not known a priori and should be
estimated. A common estimation procedure for E{vi2 (n)} is

i2 (n) = i2 (n 1) + | vi (n) | 2
where
0 < <1

In practice, should be chosen close to 1, say, = 0.9 .

90
Using a 2-coefficient adaptive filter as an example:

A 2-D error surface without transform

91
Error surface with discrete cosine transform (DCT)

92
Error surface with transform and power normalization

93
Remarks:
1. The lengths of the principle axes of the hyperellipses are proportional
to the eigenvalues of R .
2. Without power normalization, no convergence rate improvement of
using transform can be achieved.
3. The best choice for T should be Karhunen-Loeve (KL) transform which
is signal dependent. This transform can make R vv to a diagonal matrix
but the signal statistics are required for its computation.
4. Considerations in choosing a transform:
fast algorithm exists?
complex or real transform?
elements of the transform are all power of 2?
5. Examples of orthogonal transforms are discrete sine transform (DST),
discrete Fourier transform (DFT), discrete cosine transform (DCT),
Walsh-Hadamard transform (WHT), discrete Hartley transform (DHT)
and power-of-2 (PO2) transform.

94
Improvement of LMS algorithm using Newton's method

Since the eigenvalue spread of Newton based approach is 1, we can


combine the LMS algorithm and Newton's method to form the
"LMS/Newton" algorithm as follows,
1 e 2 (n)
W (n + 1) = W (n) R xx
2 W (n)
= W (n) + R xx1 e( n) X (n)
Remarks:
1. The computational complexity of the LMS/Newton algorithm is smaller
than the RLS algorithm but greater than the LMS algorithm.
2. When R xx is not available, it can be estimated as follows,

R xx (l , n) = R xx (l , n 1) + x( n + l ) x(n), l = 0,1, L , L 1

where R xx (l , n) represents the estimate of R xx (l ) at time n and 0 < < 1

95
Possible Research Directions for Adaptive Signal Processing
1. Adaptive modeling of non-linear systems
For example, second-order Volterra system is a simple non-linear system.
The output y (n) is related to the input x(n) by
L 1 L 1 L 1
(1) ( 2)
y ( n) = w ( j ) x ( n j ) + w ( j1 , j 2 ) x(n j1 ) x(n j 2 )
j =0 j1 = 0 j 2 = 0

Another related research direction is to analyze non-linear adaptive filters,


for example, neural networks, which are generally more difficult to analyze
its performance.
2. New optimization criterion for Non-Gaussian signals/noises
For example, LMF algorithm minimizes E{e 4 ( n)}.
In fact, a class of steepest descend algorithms can be generalized by the
least-mean-p (LMP) norm. The cost function to be minimized is given by
p
J = E{ e(k ) }

96
Some remarks:
When p =1, it becomes least-mean-deviation (LMD), when p =2, it is
least-mean-square (LMS) and if p =4, it becomes the least-mean-fourth
(LMF).
The LMS is optimum for Gaussian noises and it may not be true for
noises of other probability density functions (PDFs). For example, if the
noise is impulsive such as a -stable process with 1 < 2 , LMD
performs better than LMS; if the noise is of uniform distribution or if it is a
sinusoidal signal, then LMF outperforms LMS. Therefore, the optimum p
depends on the signal/noise models.
The parameter p can be any real number but it will be difficult to
analyze, particularly for non-integer p .
Combination of different norms can be used to achieve better
performance.
Some suggests mixed norm criterion, e.g. a E{e 2 (n)} + b E{e 4 (n)}

97
Median operation can be employed in the LMP algorithm for operating in
the presence of impulsive noise. For example, the median LMS belongs
to the family of order-statistics-least-mean-square (OSLMS) adaptive
filter algorithms.
3. Adaptive algorithms with fast convergence rate and small
computation
For example, design of optimal step size in LMS algorithms
4. Adaptive IIR filters
Adaptive IIR filters have 2 advantages over adaptive FIR filters:
It generalizes FIR filter and it can model IIR system more accurately
Less filter coefficients are generally required
However, development of adaptive IIR filters are generally more difficult
than the FIR filters because
The performance surface is multimodal the algorithm may lock at an
undesired local minimum
It may lead to biased solution
It can be unstable

98
5. Unsupervised adaptive signal processing (blind signal processing)
What we have discussed previously refers to supervised adaptive signal
processing where there is always a desired signal or reference signal or
training signal.
In some applications, such signals are not available. Two important
application areas of unsupervised adaptive signal processing are:
Blind source separation
e.g. speaker identification in the noisy environment of a cocktail party
e.g. separation of signals overlapped in time and frequency in wireless
communications
Blind deconvolution (= inverse of convolution)
e.g. restoration of a source signal after propagating through an
unknown wireless channel
6. New applications
For example, echo cancellation for hand-free telephone systems and
signal estimation in wireless channels using space-time processing.

99
Questions for Discussion
1. The LMS algorithm is given by (4.23):

wi (n + 1) = wi (n) + 2e(n) x(n i ), i = 0,1, L , L 1

where

e( n ) = d ( n ) y ( n )

L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0

Based on the idea of LMS algorithm, derive the adaptive algorithm that
minimizes E{ e(n) }.

|v|
(Hint: = sgn(v) where sgn(v) = 1 if v > 1 and sgn(v) = 1 otherwise)
v

100
2. For adaptive IIR filtering, there are basically two approaches, namely,
output-error and equation-error. Let the unknown IIR system be

N 1
j
bjz
B( z ) j =0
H ( z) = =
A( z ) M 1
1 + a i z i
i =1

Using minimizing mean square error as the performance criterion, the


output-error scheme is a direct approach which minimizes E{e 2 (n)}
where

e( n ) = d ( n ) y ( n )

with

101
N 1
j
b j z
Y ( z ) B ( z ) j =0
= =
X ( z ) A ( z ) M 1
1 + a i z i
i =1
N 1 M 1
y (n) = b j x(n j ) a i y (n i )
j =0 i =1

However, as in Q.2 of Chapter 3, this approach has two problems,


namely, stability and multimodal performance surface.

On the other hand, the equation-error approach is always stable and has
a unimodal surface. Its system block diagram is shown in the next page.

Can you see the main problem of it and suggest a solution?

(Hint: Assume n(k ) is white and examine E{e 2 (k )})

102
n(k)
+
Unknown system d(k) +
s(k)
H(z)

r(k)

+
B(z) A(z)
e(k)

103
Chapter 5
Estimation Theory and Applications

References:

S.M.Kay, Fundamentals of Statistical Signal Processing: Estimation


Theory, Prentice Hall, 1993

1
Estimation Theory and Applications
Application Areas
1. Radar

Radar system transmits an electromagnetic pulse s (n) . It is reflected by


an aircraft, causing an echo r (n) to be received after 0 seconds:
r (n) = s (n 0 ) + w(n)
where the range R of the aircraft is related to the time delay by
0 = 2R / c

2
3
2. Mobile Communications

The position of the mobile terminal can be estimated using the time-of-
arrival measurements received at the base stations.

4
3. Speech Processing
Recognition of human speech by a machine is a difficult task because
our voice changes from time to time.
Given a human voice, the estimation problem is to determine the
speech as close as possible.

4. Image Processing
Estimation of the position and orientation of an object from a camera
image is useful when using a robot to pick it up, e.g., bomb-disposal

5. Biomedical Engineering
Estimation the heart rate of a fetus and the difficulty is that the
measurements are corrupted by the mothers heart beat as well.

6. Seismology
Estimation of the underground distance of an oil deposit based on
sound reflection due to the different densities of oil and rock layers.

5
Differences from Detection
1. Radar

Radar system transmits an electromagnetic pulse s (n) . After some time,


it receives a signal r (n) . The detection problem is to decide whether r (n)
is
echo from an object or it is not an echo

6
7
2. Communications

In wired or wireless communications, we need to know the information


sent from the transmitter to the receiver.
e.g., for binary phase shift keying (BPSK) signals, it consists of only two
symbols, 0 or 1. The detection problem is to decide whether it is 0 or
1.

8
9
3. Speech Processing
Given a human speech signal, the detection problem is decide what is the
spoken word from a set of predefined words, e.g., 0, 1,, 9

Waveform of 0

Another example is voice authentication: given a voice and it is indicated


that the voice is from George Bush, we need to decide its Bush or not.

10
4. Image Processing
Fingerprint authentication: given a fingerprint image and his owner says
he is A, we need to verify if it is true or not

Other biometric examples include face authentication, iris authentication,


etc.

11
5. Biomedical Engineering

17 Jan. 2003, Hong Kong Economics Times


e.g., given some X-ray slides, the detection problem is to determine if she
has breast cancer or not
6. Seismology
To detect if there is oil or there is no oil at a region

12
What is Estimation?
Extract or estimate some parameters from the observed signals, e.g.,

Use a voltmeter to measure a DC signal

x[n] = A + w[n], n = 0,1, L , N 1

Given x[n] , we need to find the DC value, A

the parameter is the observed signal

Estimate the amplitude, frequency and phase of a sinusoid in noise

x[n] = cos(n + ) + w[n], n = 0,1, L , N 1

Given x[n] , we need to find , and

the parameters are not directly observed in the received signal

13
Estimate the value of resistance R from a set of voltage and current
readings:
V [n] = Vactual [n] + w1[n], I [n] = I actual [n] + w2 [n], n = 0,1, L , N 1
Given N pairs of ( V [n], I [n] ), we need to estimate the resistance R ,
ideally, R = V / I
the parameter is not directly observed in the received signals

Estimate the position of the mobile terminal using time-of-arrival


measurements:

( x s xn ) 2 ( y s y n ) 2
r[ n ] = + w[n], n = 0,1, L , N 1
c
Given r[n] , we need to find the mobile position ( x s , y s ) where c is the
signal propagation speed and ( x n , y n ) represent the known position of
the n th base station
the parameters are not directly observed in the received signals

14
Types of Parameter Estimation

Linear or non-linear
Linear: DC value, amplitude of the sine wave
Non-linear: Frequency of the sine wave, mobile position

Single parameter or multiple parameters


Single: DC value; scalar
Multiple: Amplitude, frequency and phase of sinusoid; vector

Constrained or unconstrained
Constrained: Use other available information & knowledge, e.g., from
the N pairs of (V [n], I [n] ), we draw a line which best fits
the data points and the estimate of the resistance is
given by the slope of the line. We can add a constraint
that the line should cross the origin (0,0)
Unconstrained: No further information & knowledge is available

15
Parameter is unknown deterministic or random

Unknown deterministic: constant but unknown (classical)


DC value is an unknown constant
Random : random variable with prior knowledge of
PDF (Bayesian)
If we have prior knowledge that the DC value
is bounded by A0 and A0 with a particular
PDF better estimate

Parameter is stationary or changing

Stationary : Unknown deterministic for whole observation


period, time-of-arrivals of a static source

Changing : Unknown deterministic at different time


instants, time-of-arrivals of a moving source

16
Performance Measures for Classical Parameter Estimation
Accuracy:
Is the estimator biased or unbiased?

e.g., x[n] = A + w[n], n = 0,1, L , N 1

where w[n] is a zero-mean random noise with variance 2w

Proposed estimators:
A1 = x[0]
1 N 1
A 2 = x[n]
N n =0
1 N 1
A 3 = x[n]
N 1 n =0
N 1
A 4 = N x[n] = N x[0] x[1]L x[ N 1]
n=0

17
Biased : E{ A } A
Unbiased : E{ A } = A
Asymptotically unbiased : E{ A } = A only if N

Taking the expected values for A1, A2 and A3 , we have

E{ A1} = E{x[0]} = E{ A} + E{w[0]} = A + 0 = A

1 N 1 1 N 1 1 N 1
E{ A 2 } = E x[n] = E A + E w[n]
N n =0 N n =0 N n =0
1 N 1 1 N 1 1 1 N 1
= A+ E{w[n]} = N A + 0= A
N n =0 N n =0 N N n =0
N 1
E{ A 3 } = A= A
N 1 1 1/ N

Q. State the biasedness of A1, A2 and A3 .

18
For A4 , it is difficult to analyze the biasedness. However, for w[n] = 0 :

N
N x[0] x[1]L x[ N 1] = N A AL A = A N = A

What is the value of the mean square error or variance?

They correspond to the fluctuation of the estimate in the second order:

MSE = E{( A A) 2 } (5.1)

var = E{( A E{ A }) 2 } (5.2)


:
If the estimator is unbiased, then MSE = var

19
In general,

MSE = E{( A A) 2 } = E{( A E{ A } + E{ A } A) 2 }


= E{( A E{ A }) 2 } + E{( E{ A } A) 2 } + 2 E{( A E{ A })( E{ A } A)}
(5.3)
= var + ( E{ A } A) 2 + 2( E{ A } E{ A })( E{ A } A)
= var + (bias) 2

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w

1 N 1
2
1 N 1 2w
2 2
E{( A 2 A) } = E x[n] A = E w [n] =
N n = 0 N n = 0 N

1 N 1
2
A
2
2
E{( A 3 A) 2 } = E x[n] A = +
w
N 1 n = 0 N 1 N 1

20
An optimum estimator should give estimates which are

Unbiased
Minimum variance (MSE as well)

Q. How do we know the estimator has the minimum variance?

Cramer-Rao Lower Bound (CRLB)

Performance bound in terms of minimum achievable variance provided by


any unbiased estimators

Use for classical parameter estimation

Require knowledge of the noise PDF and the PDF must have closed form

More easier to determine than other variance bounds

21
Let the parameters to be estimated be = [1 , 2 , L , P ]T , the CRLB for
i in Gaussian noise is stated as follows

[ ]
CRLB( i ) = [J ()]i,i = I 1 () i ,i (5.4)
where

2 ln p (x; ) 2 ln p ( x; ) 2 ln p (x; )
- E
2 - E L - E
1 1 2 1 P

2
ln p (x; ) ln p ( x; )
2

- E - E
I () =
2
2


(5.5)
2 1
M O
2
- E ln p (x; ) 2
ln p (x; )
- E
2
P


P 1

22
p (x; ) represents PDF of x = [ x[0], x[1], L , x[ N 1]]T and it is
parameterized by the unknown parameter vector

Note that

I () is known as Fisher information matrix

[J ]i,i is the ( i, i ) element of J

1 2
e.g., J = [J ] 2 ,2 = 3
2 3

2 2
ln p ( x; ) ln p ( x; )
E = E
i j j i

23
Review of Gaussian (Normal) Distribution

The Gaussian PDF for a scalar random variable x is defined as

1 1
p( x) = exp ( x ) 2 (5.6)
2 2
2 2

We can write x ~ N (, )

The Gaussian PDF for a random vector x of size N is defined as

1 1
p ( x) = exp (x )T C -1 (x ) (5.7)
(2) N / 2 det1 / 2 (C) 2

We can write x ~ N (, C)

24
The covariance matrix C has the form of

C = E{( x ) (x )T }

E{( x[0] 0 ) 2 } L E{( x[0] 0 )( x[ N 1] N 1 )}



E{( x[0] 0 )( x[1] 1 )} O M
=
M
2
E{( x[0] 0 )( x[ N 1] N 1 )} L E{( x[ N 1] N 1 ) }
(5.8)

where
x = [ x[0], x[1], L , x[ N 1]]T

= E{x} = [ 0 , 1 , L , N 1 ]]T

25
If x is a zero-mean white vector and all vector elements have variance 2

2 0 L 0
2
0 M
C = E{( x ) (x )T } = = 2 I N
M O 0
2
0 L 0

The Gaussian PDF for the random vector x can be simplified as

1 1 N 1 2
p ( x) = exp x [ n] (5.9)
2 N /2 2
(2 ) 2 n = 0

with the use of


C -1 = 2 I N
det(C) = ( 2 ) N = 2 N

26
Example 5.1

Determine the PDF of

x[0] = A + w[0]
and
x[n] = A + w[n], n = 0,1, L , N 1

where { w(n) } is a white Gaussian process with known variance 2w and A


is a constant
1 1 2
p ( x[0]; A) =
exp 2 ( x[0] A)
2 2
2 w w

1 1 N 1
p (x;A) = exp 2 ( x[n] A)
2
2 N /2
(2 w ) 2 n = 0
w

27
Example 5.2

Find the CRLB for estimating A based on single measurement:

x[0] = A + w[0]

1 1 2
p ( x[0]; A) =
exp 2 ( x[0] A)
2
22w w
1
ln( p ( x[0]; A)) = ln( 22w ) 2 ( x[0] A) 2
2 w
ln( p ( x[0]; A)) 1 ( x[0] A)
= 2 2( x[0] A) 1 =
A 2 w 2w
2 ln( p ( x[0]; A)) 1
2
= 2
A w

28
As a result,
2 ln( p ( x[0]; A)) 1
E 2 =
A 2w

1
I ( A) = I ( A) = 2
w
J ( A) = 2w
CRLB( A) = 2w

This means the best we can do is to achieve estimator variance = 2w or

var( A ) 2w

where A is any unbiased estimator for estimating A

29
We also observe that a simple unbiased estimator

A1 = x[0]

achieves the CRLB:

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w2 [0]} = 2w

Example 5.3

Find the CRLB for estimating A based on N measurements:

x[n] = A + w[n], n = 0,1, L , N 1

1 1 N 1
p (x;A) = exp 2 ( x[n] A)
2
2 N /2
(2 w ) 2 n = 0
w

30
1 1 N 1
2
p (x; A) =
exp 2 ( x[n] A)
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; A)) = ln((2 w ) ) 2 ( x[n] A) 2
2 w n = 0
N 1
N 1
( x[n] A)
ln( p (x; A)) 1
= 2 2 ( x[n] A) 1 = n = 0 2
A 2 w n =0 w
2 ln( p (x; A)) N
2
= 2
A w

2 ln( p (x; A)) N


E 2 =
A 2w

31
As a result,

N
I ( A) = I ( A) =
2w
2w
J ( A) =
N
2w
CRLB( A) =
N

This means the best we can do is to achieve estimator variance = 2w / N


or

2

var( A ) w
N

where A is any unbiased estimator for estimating A

32
We also observe that a simple unbiased estimator

A1 = x[0]
does not achieve the CRLB

E{( A1 A) 2 } = E{( x[0] A) 2 } = E{( A + w[0] A) 2 } = E{w 2 [0]} = 2w

On the other hand, the sample mean estimator

1 N 1
A 2 = x[n]
N n =0
achieve the CRLB
1 N 1
2
1 N 1 2 2w
2
E{( A2 A) } = E x[n] A = E w [n] =
N n = 0 N n = 0 N

sample mean is the optimum estimator for white Gaussian noise

33
Example 5.4

Find the CRLB for A and 2w given {x[n]}:

x[n] = A + w[n], n = 0,1, L , N 1


1 1 N 1 2
p (x; ) = exp ( x[ n ] A) , = [ A, 2
w]
2 N /2
(2 w ) 2
2 w n = 0
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A) 2
2 w n = 0
N N 1 N 1
2
= ln(2) ln( w ) 2 ( x[n] A) 2
2 2 2 w n = 0
N 1
N 1
( x[n] A)
ln( p (x; )) 1
= 2 2 ( x[n] A) 1 = n = 0 2
A 2 w n =0 w

34
2 ln( p (x; )) N
2
= 2
A w
2 ln( p(x; )) N
E 2 = 2
A w
N 1 N 1
2 ( x[n] A) ( w[n])
ln( p (x; ))
= n =0 = n =0 4
A 2w 4w w
N 1
2 ln( p(x; )) ( E{w[n]})
E = n =0 =0
2 4
A w w

35
N N 1 N 1
ln( p (x; )) = ln(2) ln( w ) 2 ( x[n] A) 2
2
2 2 2 w n = 0
ln( p (x; )) N 1 N 1
= 2 + 4 ( x[n] A) 2
2w 2 w 2 w n = 0
2 ln( p (x; )) N 1 N 1
= 4 6 ( w[n]) 2
( 2w ) 2 2 w w n = 0
2 ln( p (x; )) N 1 2 N
E 2 2 = 4
6
N w =
( w ) 2 w w 2 4w

N
2 0
I () = w
0 N
2 4w

36
2w
0
J () = I () = N
-1
4

2 w
0 N

2w
CRLB( A) =
N
2 4w
CRLB( 2w ) =
N

the CRLBs for unknown and known noise power are identical

Q. The CRLB is not affected by knowledge of noise power. Why?

Q. Can you suggest a method to estimate 2w ?

37
Example 5.5

Find the CRLB for phase of a sinusoid in white Gaussian noise:

x[n] = A cos(0 n + ) + w[n], n = 0,1,L, N 1

where A and 0 are assumed known

The PDF is

1 1 N 1
p (x; ) = exp 2 ( x[n] A cos(0 n + ))
2
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0

38
ln( p (x; )) 1 N 1
= 2 2( x[n] A cos(0 n + )) A sin(0 n + )
2 w n = 0
A N 1 A
= 2 x[n] sin(0 n + ) sin( 20 n + 2)
w n=0 2
2 ln( p (x; )) A N 1
= 2 [x[n] cos(0 n + ) A cos(20 n + 2)]
2 w n=0
2 ln( p (x; )) A N 1
E 2 = 2 [ A cos(0 n + ) cos(0 n + ) A cos(20 n + 2)]
w n=0
A2 N 1
[
= 2 cos 2 (0 n + ) cos(20 n + 2)
w n=0
]
A2 N 1 1 1
= 2 + cos(20 n + 2) cos(20 n + 2)
w n=0 2 2

39
2 ln( p (x; )) A2 N A2 N 1
E 2 = 2 + 2 cos(20 n + 2)
w 2 2 w n = 0
NA2 A2 N 1
= 2 + 2 cos(20 n + 2)
2 w 2 w n = 0
As a result,
1 1
NA2 A2 N 1 2 2w 1 N 1
CRLB() = 2 2 cos(20 n + 2) = 2
1 cos(20 n + 2)
2 w 2 w n = 0 NA N n = 0

1 N 1
If N >> 1, cos(20 n + 2) 0
N n =0
then
2 2w
CRLB()
NA2

40
Example 5.6

Find the CRLB for A , 0 and for

x[n] = A cos(0 n + ) + w[n], n = 0,1,L, N 1, N >> 1

1 1 N 1
2
p (x; ) =
exp 2 ( x[n] A cos(0 n + )) , = [ A,0 , ]
2 N /2
(2 w )
2 w n = 0
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0
ln( p (x; )) 1 N 1
= 2 2 ( x[n] A cos(0 n + )) cos(0 n + )
A 2 w n =0
1 N 1
= 2 ( x[n] cos(0 n + ) A cos 2 (0 n + ))
w n =0

41
2 ln( p (x; )) 1 N 1 2 1 N 1 1 1
= 2 cos (0 n + ) = 2 + cos(20 n + 2)
A2 w n =0 w n =0 2 2
N

2 2w

2 ln( p (x; )) N
E
2 2
A 2 w

Similarly,

2 ln( p ( x; )) A N 1
E = 2 n sin( 20 n + 2) 0
A0 2 w n = 0
2 ln( p ( x; )) A N 1
E = 2 sin( 20 n + 2) 0
A 2 w n = 0

42
2 ln( p (x; )) A 2 N 1 2 1 1 A 2 N 1
E = n cos(20 n + 2) n2
0 2 2w n = 0 2 2 2 2w n = 0
2 ln( p ( x; )) A2 N 1 2 A2 N 1
E = 2 n sin (0 n + ) 2 n
0 w n=0 2 w n = 0
2 ln( p (x; )) A 2 N 1 2 NA 2
E
2 = 2 sin (0 n + ) 2
w n=0 2 w

N
0 0
2
1 A 2 N 1 2 A 2 N 1

I () 0 n n
2w
2 n=0 2 n=0
2
0 A 2 N 1 NA
n
2 n=0 2

43
After matrix inversion, we have

2 2w
CRLB( A)
N
12 A2
CRLB(0 ) , SNR =
2
SNR N ( N 1) 2 2w

2(2 N 1)
CRLB()
SNR N ( N + 1)
Note that
2(2 N 1) 4 1 2 2w
CRLB() > =
SNR N ( N + 1) SNR N SNR N NA

In general, the CRLB increases as the number of parameters to be


estimated increases
CRLB decreases as the number of samples increases

44
Parameter Transformation in CRLB

Find the CRLB for = g () where g () is a function

e.g., x[n] = A + w[n], n = 0,1, L , N 1

What is the CRLB for A 2 ?

The CRLB for parameter transformation of = g ( ) is given by

2
g ()


CRLB() = (5.10)
ln( p (x; ))
2
E
2

For nonlinear function, = is replaced by and it is true only for large N

45
Example 5.7

Find the CRLB for the power of the DC value, i.e., A 2 :

x[n] = A + w[n], n = 0,1, L , N 1

= g ( A) = A 2
2
g ( A) g ( A) 2
= 2A = 4A
A A
From Example 5.3, we have
2 ln( p (x; A)) N
E = 2
2 w
A
As a result,
2 2 2
2 2 w 4 A w
CRLB( A ) 4 A = , N >> 1
N N

46
Example 5.8

Find the CRLB for = c1 + c 2 A from

x[n] = A + w[n], n = 0,1, L , N 1

= g ( A) = c1 + c 2 A
2
g ( A) g ( A) 2
= c2 = c2
A A

As a result,
2

CRLB() = c 22 CRLB( A) = c 22 w
N
c 22 2w
=
N

47
Maximum Likelihood Estimation

Parameter estimation is achieved via maximizing the likelihood function

Optimum realizable approach and can give performance close to CRLB

Use for classical parameter estimation

Require knowledge of the noise PDF and the PDF must have closed form

Generally computationally demanding

Let p (x; ) be the PDF of the observed vector x parameterized by the


parameter vector . The maximum likelihood (ML) estimate is

= arg max p(x; ) (5.11)


48
e.g., given p (x = x 0 ; ) where x 0 is the observed data, as below

Q. What is the most possible value of ?

49
Example 5.9

Given
x[n] = A + w[n], n = 0,1, L , N 1

where A is an unknown constant and w[n] is a white Gaussian noise with


known variance 2w . Find the ML estimate of A .

1 1 N 1
p (x;A) = exp ( x[ n ] A) 2
(22w ) N / 2 2 2 n = 0
w

Since arg max p (x; ) = arg max{ln( p (x; ))}, taking log for p (x;A) gives

1 N 1
ln( p (x; A)) = ln((22w ) N / 2 ) ( x[ n ] A) 2
2 2w n = 0

50
Differentiate with respect to A yields
N 1
N 1
( x[n] A)
ln( p (x; A)) 1
= 2 2 ( x[n] A) 1 = n = 0 2
A 2 w n=0 w

A = arg max{ln( p (x; A)} is determined from


A

N 1
( x[n] A ) N 1
n =0 1 N 1
= 0 ( x[n] A) = 0 A = x[n]
2w n =0 N n =0
Note that
ML estimate is identical to the sample mean
Attain CRLB

Q. How about if 2w is unknown?

51
Example 5.10

Find the ML estimate for phase of a sinusoid in white Gaussian noise:

x[n] = A cos(0 n + ) + w[n], n = 0,1,L, N 1

where A and 0 are assumed known

The PDF is

1 1 N 1
p (x; ) = exp 2 ( x[n] A cos(0 n + ))
2
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0

52
It is obvious that the maximum of p (x; ) or ln( p (x; )) corresponds to the
minimum of
1 N 1 2
N 1
2
2
( x[ n ] A cos( 0 n + )) or ( x[ n ] A cos( 0 n + ))
2 w n = 0 n=0

Differentiating with respect to and then set the result to zero:


N 1
2( x[n] A cos(0 n + )) A sin(0 n + )
n =0
N 1 A
= A x[n] sin(0 n + ) sin( 20 n + 2) = 0
n=0 2
N 1 A N 1
x[n] sin(0 n + ) = sin( 20 n + 2 )
n=0 2 n =0
The ML estimate for is determined from the root of the above equation
Q. Any ideas to solve the nonlinear equation?

53
Approximate ML (AML) solution may exist and it depends on the structure
of the ML expression. For example, there exists an AML solution for
N 1 A N 1

x[n] sin(0 n + ) = sin( 20 n + 2 )
n=0 2 n =0
1 N 1 A 1 N 1
A
x[n] sin(0 n + ) = sin( 20 n + 2) 0 = 0, N >> 1
N n=0 2 N n=0 2

The AML solution is obtained from


N 1
x[ n] sin(0 n + ) = 0
n=0
N 1 N 1
x[n] sin(0 n) cos( ) + x[n] cos(0 n) sin( ) = 0
n=0 n=0
N 1 N 1
cos( ) x[n] sin(0 n) = sin( ) x[n] cos(0 n)
n =0 n=0

54
N 1 x[ n] sin( n)

= tan 1 Nn =01 0
x[ n ] cos( n )
n =0 0

In fact, the AML solution is reasonable:

N 1 ( A cos( n + ) + w[ n]) sin( n)


1 n=0 0 0
= tan
N 1 ( A cos( n + ) + w[n]) cos( n)
n=0 0 0
NA N 1 w[ n] sin( n)
sin( ) + n=0 0
1 2 ,
tan N >> 1
NA
cos() + nN=01 w[n] cos(0 n)
2
2 N 1
sin() n = 0 w[n] sin(0 n)
= tan 1 NA
cos() + 2 N 1 w[n] cos( n)
0
NA n = 0

55
For parameter transformation, if there is a one-to-one relationship
between = g () and , the ML estimate for is simply:

= g ( ) (5.12)

Example 5.11

Given N samples of a white Gaussian process w[n], n = 0,1, L , N 1, with


unknown variance 2 . Determine the power of w[n] in dB.

The power in dB is related to 2 by

P = 10 log10 ( 2 )

which is a one-to-one relationship. To find the ML estimate for P , we first


find the ML estimate for 2

56
2 1 1 N 1 2
p(w; ) = exp x [ n]
2 N /2 2
(2 ) 2 n = 0
2 N N 2 1 N 1 2
ln( p (w; )) = ln(2) ln( ) x [ n]
2 2 2
2 n = 0
Differentiating the log-likelihood function w.r.t. to 2 :

ln( p (w; 2 )) N 1 N 1 2
= 2 + 4 x [ n]
2 2 2 n = 0
Setting the resultant expression to zero:
N 1 N 1 2 2 1 N 1 2
2
= 4 x [n] = x [ n]
2 2 n = 0 N n=0
As a result,
1 N 1
2 2
P = 10 log10 ( ) = 10 log10 x [ n]
N n =0

57
Example 5.12

Given
x[n] = A + w[n], n = 0,1, L , N 1

where A is an unknown constant and w[n] is a white Gaussian noise with


unknown variance 2 . Find the ML estimates of A and 2 .

1 1 N 1 2 2 T
p (x;) = 2 N /2
exp 2 ( x[n] A) , = [A ]
(2 ) 2 n = 0

ln( p (x;)) 1 N 1
= 2 ( x[n] A)
A n =0

ln( p (x;)) N 1 N 1 2
= + ( x[ n ] A)
2 2 2 2 4 n = 0

58
Solving the first equation:

N 1
A = 1 x[n] = x
N n =0

Putting A = A = x in the second equation:

2 1 N 1 2
= ( x[n] x )
N n=0

Numerical Computation of ML Solution

When the ML solution is not of closed form, it can be computed by

Grid search

Numerical methods: Newton-Raphson, Golden section, bisection, etc

59
Example 5.13

From Example 5.10, the ML solution of is determined from

N 1 A N 1

x[n] sin(0 n + ) = sin( 20 n + 2 )
n=0 2 n =0

Suggest methods to find

Approach 1: Grid search


Let
N 1 A N 1
g () = x[n] sin(0 n + ) sin( 20 n + 2)
n =0 2 n=0
It is obvious that
= root of g ()

60
The idea of grid search is simple:
Search for all possible values of or a given range of to find root
Values are discrete tradeoff between resolution & computation
e.g., Range for : any values in [0,2)
Discrete points : 1000 resolution is 2 / 1000
MATLAB source code:
N=100;
n=[0:N-1];
w = 0.2*pi;
A = sqrt(2);
p = 0.3*pi;
np = 0.1;
q = sqrt(np).*randn(1,N);
x = A.*cos(w.*n+p)+q;
for j=1:1000
pe = j/1000*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end

61
pe = [1:1000]/1000;
plot(pe,g)

Note: x-axis is /( 2)

62
stem(pe,g)
axis([0.14 0.16 -2 2])

g (0.152 2) = -0.2324 , g (0.153 2) = 0.2168


= 0.153 2 = 0.306 ( 0.001 )

63
For a smaller resolution, say 200 discrete points:

clear pe;
clear s1;
clear s2;
clear g;
for j=1:200
pe = j/200*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end
pe = [1:200]/200;
plot(pe,g)

64
stem(pe,g)
axis([0.14 0.16 -2 2])

g (0.150 2) = -1.1306 , g (0.155 2) = 1.1150


= 0.155 2 = 0.310 ( 0.005 )
Accuracy increases as number of grids increases

65
Approach 2: Newton/Raphson iterative procedure
With initial guess 0 , the root of g () can be determined from
g ( k )
g ( k )

k +1 = k = (5.13)
dg () g ' ( k )
d =
k
N 1 A N 1
g () = x[n] sin(0 n + ) sin( 20 n + 2)
n =0 2 n=0
N 1 A N 1
g ' () = x[n] cos(0 n + ) cos(20 n + 2) 2
n=0 2 n =0
N 1 N 1
= x[n] cos(0 n + ) A cos(20 n + 2)
n=0 n =0
with
0 = 0

66
p1 = 0;
for k=1:10
s1 =sin(w.*n+p1);
s2 =sin(2.*w.*n+2.*p1);
c1 =cos(w.*n+p1);
c2 =cos(2.*w.*n+2.*p1);
g = x*s1'-A/2*sum(s2);
g1 = x*c1'-A*sum(c2);
p1 = p1 - g/g1;
p1_vector(k) = p1;
end
stem(p1_vector/(2*pi))

Newton/Raphson method converges at ~ 3rd iteration


= 0.1525 2 = 0.305
Q. Can you comment on the grid search & Newton/Raphson method?

67
ML Estimation for General Linear Model

The general linear data model is given by

x = H + w (5.14)

where

x is the observed vector of size N


w is Gaussian noise vector with known covariance matrix C
H is known matrix of size N p
is parameter vector of size p

Based on (5.7), the PDF of x parameterized by is

1 1 T -1
p (x; ) = N /2 1/ 2
exp ( x H ) C ( x H ) (5.15)
( 2 ) det (C) 2

68
Since C is not a function of , the ML solution is equivalent to

= arg min{J()} where J () = (x H)T C-1 (x H)


Differentiating J () with respect to and then set the result to zero:

- 2HT C-1 x + 2HT C-1 H = 0


HT C-1 x = HT C-1 H

As a result, the ML solution for linear model is

= (HT C-1H ) 1 HT C-1x (5.16)


For white noise:

= (HT ( 2w I ) -1 H ) 1 HT ( 2w I ) -1 x = (HT H ) 1 HT x (5.17)

69
Example 5.14
Given N pair of ( x, y ) where x is error-free but y is subject to error:

y[n] = m x[n] + c + w[n] , n = 0,1, L , N 1

where w is white Gaussian noise vector with known covariance matrix C

Find the ML estimates for m and c


y[n] = m x[n] + c + w[n]
m
y[n] = [ x[n] 1] + w[n] = [ x[n] 1] + w[n], = [m c]T
c
y[0] = [ x[0] 1] + w[0]
y[1] = [ x[1] 1] + w[1]

L L L
y[ N 1] = [ x[ N 1] 1] + w[ N 1]

70
Writing in matrix form:
y = H + w
where

y = [ y[0], y[1], L , y[ N 1]]T

x[0] 1
x[1] 1
H=
M M
x[ N 1] 1

Applying (5.16) gives

= = (HT C 1H ) 1 HT C 1y
m
c

71
Example 5.15

Find the ML estimates of A , 0 and for

x[n] = A cos(0 n + ) + w[n], n = 0,1,L, N 1, N >> 1

where w[n] is a white Gaussian noise with variance 2w

Recall from Example 5.6:

1 1 N 1 2
p (x; ) =
exp ( x[n] A cos(0 n + )) , = [ A,0 , ]
2 N /2
(2 w ) 2
2 w n = 0

The ML solution for can be found by minimizing

N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0

72
This can be achieved by using a 3-D grid search or Netwon/Raphson
method but it is computationally complex
Another simpler solution is as follows
N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0
N 1
= ( x[n] A cos() cos(0 n) + A sin() sin(0 n)) 2
n=0

Since A and are not quadratic in J ( A, 0 , ) , the first step is to use


parameter transformation:

A = 12 + 22
1 = A cos()

2 = A sin() 1 2
= tan
1

73
Let
c = [1 cos(0 ) L cos(0 ( N 1))]T
s = [0 sin(0 ) L sin(0 ( N 1))]T

We have

J (1, 2 , 0 ) = (x 1c 2s)T (x 1c 2s)

1
T
= (x - H ) (x - H ), = , H = [c s]
2
Applying (5.17) gives

= (HT H ) 1 HT x

74
Substituting back to J (1, 2 , 0 ) :
J (0 ) = (x - H )T (x - H )
= (x - H (HT H ) 1 HT x)T (x - H (HT H ) 1 HT x)

( T 1
= (I - H ( H H ) H ) x T
) ((I - H (HT H)1 HT ) x)
T

= xT (I - H (HT H ) 1 HT )T (I - H (HT H ) 1 HT ) x
= xT (I - H (HT H ) 1 HT ) x
= xT x - xT H (HT H ) 1 HT x
Minimizing J (0 ) is identical to maximizing
xT H (HT H ) 1 HT x
or
{
0 = arg max xT H (HT H ) 1 HT x

0
}
3-D search is reduced to a 1-D search

75
After determining 0 , can be obtained as well

For sufficiently large N :


1
[ c c c s
] cT x
T T
T T 1 T T T
x H (H H) H x = c x s x T T
T
s c s s s x
1
[ T T N / 2
c x s x
0
]
N
0
/ 2

cT x
T
s x
2 T 2
( ) ( )
T 2
= c x + s x
N
2
2 N 1
= x[n] exp( j0 n)
N n=0

1 N 1 2


0 = arg max x[n] exp( j0 n) periodogram maximum
0 N n = 0

76
Least Squares Methods

Parameter estimation is achieved via minimizing a least squares (LS) cost


function

Generally not optimum but computationally simple

Use for classical parameter estimation

No knowledge of the noise PDF is required

Can be considered as a generalization of LS filtering

77
Variants of LS Methods
1. Standard LS
Consider the general linear data model:
x = H + w
where
x is the observed vector of size N
w is zero-mean noise vector with unknown covariance matrix
H is known matrix of size N p
is parameter vector of size p

The LS solution is given by

{ }
= arg min (x - H )T (x - H ) = (HT H ) 1 HT x

(5.18)

which is equal to (5.17)

78
LS solution is optimum if covariance matrix of w is C = 2w I and w is
Gaussian distributed
Define

e = x - H

where

e = [e(0) e(1) L e( N 1)]T

(5.18) is equivalent to
N 1
= arg min e 2 (k ) (5.19)
k =0

which is similar to LS filtering

Q. Any differences between (5.19) and LS filtering?

79
Example 5.16

Given
x[n] = A + w[n], n = 0,1, L , N 1

where A is an unknown constant and w[n] is a zero-mean noise

Find the LS solution of A


Using (5.19),
N 1
A = arg min ( x[n] A)2

A n=0
N 1
Differentiating ( x[n] A)2 with respect to A and set the result to 0:
n=0
N 1
A = 1 x[n]
N n =0

80
On the other hand, writing {x[n]} in matrix form:
x = HA + w
where
1
1
H=
M
1

Using (5.18),
1
1 x[0]
1 x[1]
N 1
1
A = [1 1 L 1] [1 1 L 1] = N x[n]
M M n =0
1 x[ N 1]

Both (5.18) and (5.19) give the same answer and the LS solution is

81
optimum if the noise is white Gaussian
Example 5.17
Consider the LS filtering problem again. Given

d [n] = X T [n] W + q[n], n = 0,1, L , N 1


where
d [n] is desired response
X [n] = [ x[n] x[n 1] L x[n L + 1]]T is the input signal vector
W = [ w0 w1 L wL 1 ]T is the unknown filter weight vector
q[n] is zero-mean noise
Writing in matrix form:
d = H W + q, W =W
Using (5.18):
= (HT H ) 1 HT d
W

82
where
X T (0) x[0] 0 L 0
T
X (1) x[1] x[ 0 ] L 0
H= =
M M M M M
T
X ( N 1) x[ N 1] x[ N 2 ] L x[ N L ]

with x[1] = x[2] = L = 0

Note that
R xx = HT H
R dx = HT d

where R xx is not the original version but not modified version of (3.6)

83
Example 5.18
Find the LS estimate of A for

x[n] = A cos(0 n + ) + w[n], n = 0,1,L, N 1, N >> 1


where 0 and are known constants while w[n] is zero-mean noise

Using (5.19),
N 1
A = arg min ( x[n] A cos(0 n + ) )2
A n=0
N 1
( x[n] A cos(0 n + ) ) with respect to A & set result to 0:
2
Differentiate
n=0
N 1
2 ( x[n] A cos(0 n + ) ) cos(0 n + ) = 0
n=0
N 1 N 1
x[n] cos(0 n + ) = A cos 2 (0 n + )
n=0 n =0

84
The LS solution is then
N 1
x[n] cos(0 n + )
A = n =N01
2
cos (0 n + )
n=0

2. Weighted LS

Use a general form of LS via a symmetric weighting matrix W

{ }
= arg min (x - H )T W(x - H ) = (HT WH ) 1 HT Wx

(5.20)
such that
W = WT

Due to the presence of W , it is generally difficult to write the cost function


(x - H )T W(x - H ) in scalar form as in (5.19)

85
Rationale of using W : put larger weights on data with smaller errors

put smaller weights on data with larger errors

When W = C-1 where C is covariance matrix of the noise vector:

= (HT C-1H ) 1 HT C-1x (5.21)

which is equal to the ML solution and is optimum for Gaussian noise

Example 5.19
Given two noisy measurements of A :
x1 = A + w1 and x2 = A + w2
where w1 and w2 are zero-mean uncorrelated noises with known
variances 12 and 22 . Determine the optimum weighted LS solution

86
Use
1
-1 1 0 1 / 12 0
2
W=C = 2
= 2
0 2 0 1 / 2

Grouping x1 and x2 into matrix form:

x1 1 w1
x = 1 A + w
2 2
or
x = H A+w

Using (5.21)
1
1 / 2
0 1 1 / 12 0 x1
A = (H C H ) H C x = [1 1]
T -1 1 T -1 1

[1 1] 2 x
2 1
1 / 2 2
0 1 / 2 0

87
As a result,
1
x1 x2 2
2
A = 1 + 1 + = 2 x + 1 x
2 2 2 2 2 + 2 1 2 2 2
1 + 2
1 2 1 2 1 2

Note that

If 22 > 12 , a larger weight is placed on x1 and vice versa


If 22 = 12 , the solution is equal to the standard sample mean
The solution will be more complicated if w1 and w2 are correlated
Exact values for 12 and 22 are not necessary, only ratio is needed

Define = 12 / 22 , we have

1
A = x1 + x2
1+ 1+

88
3. Nonlinear LS
The LS cost function cannot be represented as a linear model as in

x = H + w

In general, it is more complex to solve, e.g.,

The LS estimates for A, 0 and can be found by minimizing

N 1
2
( x[n] A cos(0 n + ))
n=0

whose solution is not straightforward as seen in Example 5.15

Grid search and numerical methods are used to find the minimum

89
4. Constrained LS
The linear LS cost function is minimized subject to constraints:


{ }
= arg min (x - H )T (x - H ) subject to S (5.22)

where S is a set of equalities/inequalities in terms of

Generally it can be solved by linear/nonlinear programming, but simpler


solution exists for linear and quadratic constraint equations, e.g.,

Linear constraint equation: 1 + 2 + 3 = 10

Quadratic constraint equation: 12 + 22 + 32 = 100

Other types of constraints: 1 > 2 > 3 > 10


1 + 22 + 33 100

90
Consider the constraints S is

A = b

which contains r linear equations. The constrained LS problem for linear


model is


{
= arg min (x - H )T (x - H ) } subject to A = b (5.23)

The technique of Lagrangian multipliers can solve (5.23) as follows

Define the Lagrangian

J c = (x - H )T (x - H ) + T ( A - b) (5.24)

where is a r -length vector of Lagrangian multipliers

The procedure is first solve then

91
Expanding (5.24):

J c = xT x - 2T HT x + T HT H + T A - T b

Differentiate J c with respect to :

J c
= -2HT x + 2HT H + AT

Set the result to zero:

- 2HT x + 2HT H c + AT = 0
T 1 T 1 T 1 T 1 T 1 T
c = (H H ) H x - (H H ) A = - (H H ) A
2 2
where is the LS solution. Put c into A = b :

1
A c = A - A(HT H ) 1 AT = b = ( A(HT H ) 1 AT ) 1 ( A - b)
2 2

92
Put back to c :

c = - (HT H ) 1 AT ( A(HT H ) 1 AT ) 1 ( A - b)
Idea of constrained LS can be illustrated by finding minimum value of y :
y = x 2 3x + 2 subject to x y = 1

93
5. Total LS
Motivation: Noises at both x and H :

x + w 1 = H + w 2 (5.25)

where w1 and w 2 are zero-mean noise vectors

A typical example is LS filtering in the presence of both input noise and


output noise. The noisy input is

x(k ) = s (k ) + ni (k ), n = 0,1, L , N 1

and the noisy output is

r (k ) = s (k ) h(k ) + no (k ), n = 0,1, L , N 1

The parameters to be estimated are {h(k )} given x(k ) and y (k )

94
Another example is in frequency estimation using linear prediction:
For a single sinusoid s (k ) = A cos(k + ) , it is true that
s (k ) = 2 cos() s (k 1) s (k 2)

s (k ) is perfectly predicted by s (k 1) and s (k 1) :

s (k ) = a0 s (k 1) + a1s (k 2)

It is desirable to obtain a0 = 2 cos() and a1 = 1 in estimation process

In the presence of noise, the observed signal is


x(k ) = s (k ) + w(k ), n = 0,1, L , N 1

The linear prediction model is now

95
x(k ) = a0 x(k 1) + a1x(k 2), n = 0,1, L , N 1
x(2) = a0 x(1) + a1x(0) x(2) x(1) x ( 0)
x(3) = a0 x(2) + a1x(1) x(3) x(2) x(1) a0
=
L L L M M M a1
x( N 1) = a0 x( N 2) + a1x( N 3) x( N 1) x( N 2) x( N 2)

s (2) w(2) s (1) s ( 0) w(1) w(0)


s (3) w(3) s (2) s (1) a0 w(2) w(1) a0
+ = +
M M M M
a1 M M a1
s ( N 1) w( N 1) s ( N 2) s ( N 2) w( N 2) w( N 2)

6. Mixed LS
A combination of LS, weighted LS, nonlinear LS, constrained LS and/or
total LS

Examples: weighted LS with constraints, total LS with constraints, etc.

96
Questions for Discussion
1. Suppose you have N pairs of ( xi , yi ) , i = 1,2, L , N and you need to fit
them into the model of y = ax . Assuming that only { yi } contain zero-
mean noise, determine the least squares estimate for a .

(Hint: ithe relationship between xi and yi is

yi = axi + ni , i = 1,2, L , N

where { ni } are the noise in { yi }.)

97
2. Use least squares to estimate the line y = ax in Q.1 but now only {xi }
contain zero-mean noise.

3. In a radar system, the received signal is

r (n) = s (n 0 ) + w(n)
where the range R of an object is related to the time delay by

0 = 2R / c

Suppose we get an unbiased estimate of 0 , say, 0 , and its variance is


var( 0 ) . Determine the corresponding range variance var(R ) , where R is
the estimate of R .

If var( 0 ) = (0.1s ) 2 and c = 3 108 ms 1, what is the value of var(R ) ?

98
99

Вам также может понравиться