Академический Документы
Профессиональный Документы
Культура Документы
1
Brief Review of Discrete-Time Signal Processing
There are 3 types of signals that are functions of time:
continuous-time (analog) : defined on a continuous range of time
discrete-time : defined only at discrete instants of time (,(n-1)T,nT,
(n+1)T,)
digital (quantized) : both time and amplitude are discrete
2
Digital Signal Processing Applications
Speech
Coding (compression)
Synthesis (production of speech signals, e.g., speech development kit
by Microsoft )
Recognition (e.g., PCCWs 1083 telephone number enquiry system
and many applications for disabled persons as well as security)
Animal sound analysis
Music
Generation of music by different musical instruments such as piano,
cello, guitar and flute using computer
Song with low-cost electronic piano keyboard quality
3
Image
Compression
Recognition such as face, palm and fingerprint
Construction of 3D objects from 2D images
Animation, e.g., Toy Story ()
Special effects such as adding Forrest Gump to a film of President
Nixon in and removing some objects in a photograph or
movie
Digital Communications
Encryption
Transmission and Reception (coding / decoding, modulation /
demodulation, equalization)
Digital Control
4
Transform from Time to Frequency
transform
x(t ) X ()
inverse
transform
Fourier Series
express periodic signals using harmonically related sinusoids
different definitions for continuous-time & discrete-time signals
frequency takes discrete values: 0 , 20 , 30 , ...
Fourier Transform
frequency analysis tool for aperiodic signals
defined on a continuous range of
different definitions for continuous-time & discrete-time signals
Fast Fourier transform (FFT) an computationally efficient method for
computing Fourier transform of discrete signals
5
6
Transform Time Domain Frequency Domain
7
Fourier Series
Fourier series are used to represent the frequency contents of a periodic
and continuous-time signal. A continuous-time function x(t ) is said to be
periodic if there exists TP > 0 such that
The smallest TP for which (I.1) holds is called the fundamental period.
Every periodic function can be expanded into a Fourier series as
x(t ) = c k e jk0t , t (, ) (I.2)
k =
where
0 TP / 2 j kt
ck = x(t )e 0 dt (I.3)
2 TP / 2
8
Example 1.1
The signal x(t ) = cos(100t ) + cos(200t ) is a periodic and continuous-
time signal.
e j 0 t + e j 0 t e j 2 0 t + e j 2 0 t
Since x(t ) = cos(100t ) + cos(200t ) = +
2 2
By inspection and using (I.2), we have c1 = 1 / 2 , c1 = 1 / 2 , c2 = 1 / 2 ,
c 2 = 1 / 2 while all other Fourier series coefficients are equal to zero.
9
Fourier Transform
and
1 j t
Inverse transform: x(t ) = X ( )e d (I.5)
2
10
Example 1.2
Find the Fourier transform of the following rectangular pulse:
1, t < T1
x(t ) =
0, t > T1
11
Example 1.3
Find the inverse Fourier transform of
1, <W
X () =
0, >W
Using (I.5),
1 W j t sin(Wt )
x(t ) = W e d =
2 t
12
Discrete-Time Fourier Transform (DTFT)
13
The DTFT can be obtained by substituting x s (t ) into the Fourier transform
equation of (1.4):
X () = xs (t )e jt dt
= x(t ) (t nT )e jt dt
n =
= x(t )(t nT )e jt dt
n =
= x(nT )e jnT
n = (I.7)
f (t )(t t 0 )dt = f (t 0 )
14
Some points to note:
Forward Transform: X () = x(n)e jn (I.8)
n =
and
1 jn
Inverse Transform: x ( n) = X ()e d (I.9)
2
15
Example 1.4
Find the DTFT of the following discrete-time signal:
1, n N1
x[n] =
0, n > N1
Using (I.8),
N1 = 2
N1
j n
X () = e
n = N1
16
z-Transform
X ( z ) = Z {x[n]} = x[n]z n (I.10)
n =
where z is a complex variable. Substituting z = e j yields DTFT.
X ( z ) = x[n]r n e jn = F {x[n]r n } (I.11)
n =
17
Advantages of using z -transform over DTFT:
can encompass a broader class of signal since Fourier transform does
not converge for all sequences:
A sufficient condition for convergence of the DTFT is
j n
| X () | | x(n) | | e | | x(n) |< (I.12)
n = n =
Therefore, if x(n) is absolutely summable, then X () exists.
On the other hand, by representing z = re j , the z -transform exists if
j n jn
| X ( z ) |=| X (re ) | | x(n)r || e | | x(n)r n |< (I.13)
n = n =
we can choose a region of convergence (ROC) for z such that the z -
transform converges
notation convenience : z e j
can solve problems in discrete-time signals and systems, e.g. difference
equations
18
Example 1.5
Determine the z-transform of x[n] = a n u[n].
X ( z) = a n
u[n]z n
= (az 1 ) n
n = n =0
1 n
X (z ) converges if az < . This requires az 1 < 1 or z > a , and
n =0
1
X ( z) =
1 az 1
n = m =1 m =1
19
In this case, X (z ) converges if a 1 z < 1 or z < a , and
1
X ( z) =
1 az 1
ROC of ROC of
x[n] = a n u[n] x[n] = a n u[n 1]
20
21
Transfer Function and Difference Equation
A linear time-invariant (LTI) system with input sequence x(n) and output
sequence y (n) are related via an Nth-order linear constant coefficient
difference equation of the form:
N M
a k y (n k ) = bk x(n k ), a 0 0, b0 0 (I.14)
k =0 k =0
Applying z -transform to both sides with the use of the linearity property
and time-shifting property, we have
N M
k
ak z Y ( z ) = bk z k X ( z ) (I.15)
k =0 k =0
The system (or filter) transfer function is expressed as
M M
k 1
bk z (1 c k z )
Y ( z) k =0 b0 k =1
H ( z) = = N
= N (I.16)
X ( z) k a 0 (1 d z 1 )
ak z k
k =0 k =1
1
where each (1 ck z ) contributes a zero at z = ck and a pole at z = 0
while each (1 d k z 1 ) contributes a pole at z = d k and a zero at z = 0 .
22
The frequency response of the system or filter can be computed as
H () = H ( z ) z =exp( j) (I.17)
1 M N
y ( n ) = bk x ( n k ) a k y ( n k ) (I.18)
a 0 k =0 k =1
23
Example 1.6
Consider a LTI system with the input x[n] and output y[n] satisfy the
following linear constant-coefficient difference equation,
1 1
y[n] y[n 1] = x[n] + x[n 1]
2 3
24
25
Example 1.7
Suppose you need to high-pass the signal x[n] by the high-pass filter with
the following transfer function
1
H ( z) =
1 + 0.99 z 1
Y ( z) 1 1
H ( z) = = 1
Y ( z ) + 0 . 99 z Y ( z) = X ( z)
X ( z ) 1 + 0.99 z
26
27
Causality, Stability and ROC:
h[n] is right-sided
28
Example 1.8
Verify if the system impulse response h[n] = 0.5 n u[n] is causal and stable.
It is obvious that h[n] is causal because h[n] = 0 for all n < 0 . On the other
hand,
1
n n 1 n
H ( z ) = 0.5 u[n]z = ( 0 .5 z ) =
n = n=0 1 0.5 z 1
1 n
H (z ) converges if 0.5 z < . This requires 0.5 z 1 < 1 or z > 0.5 ,
n=0
(Notice that for another impulse response h[n] = 0.5n u[ n 1], and it
corresponds to an unstable system because the ROC for H ( z ) is
z < 0 .5 )
29
The z-transform for h[n] is
1
H ( z) = 1
, | z |> 0.5
1 0 .5 z
Hence it is stable because the ROC for H (z ) includes the unit circle z = 1
n 2 3
h[n] = 0.5 = 1 + 0.5 + 0.5 + L
n = n=0
1
= =2
1 0.5
<
30
Brief Review of Random Processes
Basically there are two types of signals:
Deterministic Signals
x(t ) = a (t ) cm cos(2mf 0t + m )
m =1
where:
31
a (t ) is the envelope
32
33
Random Signals
P
x[n] = ai x[n i ] + w[n]
i =1
where
34
Definitions and Notations
1. Mean Value
The mean value of a real random variable x(n) at time n is defined as
(n) = E{x(n)} = x(n) f ( x(n))d ( x(n) ) (I.19)
where f ( x(n)) is the PDF of x(n) such that
f ( x(n))d ( x(n) ) = 1 and f ( x(n)) 0
35
2. Moment
E{( x(n) ) } = ( x(n) )m f ( x( n))d ( x( n) )
m
(I.22)
3. Variance
The variance of a real random variable x(n) at time n is defined as
(n) = E{( x(n) (n)) } = ( x(n) (n)) 2 f ( x(n))d ( x(n) )
2 2
(I.23)
It is also called second central moment.
36
Example 1.9
Determine the mean, second-order moment, variance of a quantization
error, x , with the following PDF:
a a
1 1 1 2
= x f ( x)dx = x dx = x =0
a 2a 2a 2 a
a
2
2
a
2 1 1 1 3 a2
E{x } = x f ( x)dx = x dx = x =
a 2a 2a 3 a 3
2
a
2 = E{( x ) 2 } = E{x 2 } =
3
37
4. Autocorrelation
The autocorrelation of a real random signal x(n) is defined as
R xx (m, n) = E{x(m) x(n)} = x(m) x(n) f ( x(m), x(n) )d ( x(m) )d ( x(n) ) (I.24)
where f ( x(m), x( n) ) is the joint PDF of x(m) and x(n) . It measures the
degree of association or dependence between x at time index n and at
index m .
In particular,
R xx (n, n) = E{x 2 ( n)} (I.25)
is the mean square value or average power of x(n) . Moreover, when x(n)
has zero-mean, then
2 ( n) = R xx (n, n) = E{x 2 ( n)} (I.26)
That is, the power of x(n) is equal to the variance of x(n) .
38
5. Covariance
The covariance of a real random signal x(n) is defined as
C xx (m, n) = R xx (m, n)
39
6. Crosscorrelation
The crosscorrelation of two real random signals x(n) and y (n) is defined
as
R xy (m, n) = E{x(m) y (n)} = x(m) y (n) f ( x(m), y (n) )d ( x(m) )d ( y (n) ) (I.28)
where f ( x(m), y (n) ) is the joint PDF of x(m) and y (n) . It measures the
correlation of x(n) and y (n) . The signals x(m) and y (n) are uncorrelated if
Rxy (m, n) = E{x(m)} E{x(n)}.
7. Independence
Two real random variables x(n) and y (n) are said to be independent if
f ( x(n), y (n) ) = f ( x(n) ) f ( y (n) ) E{x(n) y (n)} = E{x(n)} E{ y (n)} (I.29)
Q.: Does uncorrelated implies independent or vice versa?
40
8. Stationarity
A discrete random signal is said to be strictly stationary if its k -th order
PDF f ( x(n1 ), x(n2 ), L , x(nk )) is shift-invariant for any set of n1 , n2, L , nk
and for any k . That is
f ( x(n1 ), x(n2 ), L , x(nk )) = f ( x(n1 + n0 ), x(n2 + n0 ), L , x(nk + n0 )) (I.30)
41
Three important properties of R xx (i ) :
(i) R xx (i ) is an even sequence, i.e.,
R xx (i ) = R xx (i ) (I.33)
and hence is symmetric about the origin.
Q.: Why is it an even sequence?
(ii)The mean square value or power is greater than or equal the magnitude
of the correlation for any other lag, i.e.,
E{x 2 (n)} = R xx (0) | R xx (i ) |, i0 (I.34)
which can be proved by the Cauchy-Schwarz inequality:
| E{a b} | E{a 2 } E{b 2 }
(iii)When x(n) has zero-mean, then
2 = E{x 2 (n)} = R xx (0) (I.35)
42
9. Ergodicity
A stationary process is said to be ergodic if its time average using infinite
samples equals its ensemble average. That is, the statistical properties of
the process can be determined by time averaging over a single sample
function of the process. For example,
Unless stated otherwise, we assume that random signals are ergodic (and
thus stationary) in this course.
43
Example 1.10
Consider an ergodic stationary process { x[n] }, = L,1,0,1,L which is
uniformly distributed between 0 and 1.
1 1
1 1
[m] = x[m] f ( x[m])dx[m] = x[m]dx[m] = x 2 [m] =
0 2 0 2
1 N / 2 1 1
lim x[n] = =
N N n=N / 2 2
44
10. Power Spectrum
For random signals, power spectrum or power spectral density (PSD) is
used to describe the frequency spectrum.
Q.: Can we use DTFT to analyze the spectrum of random signal? Why?
1 ji
R xx (i ) = xx ()e d (I.37)
2
Q.: Why?
45
Under a mild assumption:
1 N
lim k R xx (k ) = 0
N N k = N
N 1
Since x(n)e jn corresponds to the DTFT of x(n) , we can consider the
n=0
2
PSD as the time average of X () based on infinite samples.
(1.38) also implies that the PSD is a measure of the mean value of the
DTFT of x(n) .
46
Common Random Signal Models
1. White Process
A discrete-time zero-mean signal w(n) is said to be white if
2w , m=n
R ww (m n) = E{w(n) w(m)} = (I.39)
0, otherwise
Moreover, the PSD of w(n) is flat for all frequencies:
ww () = Rww (i )e ji = R ww (0) e j0 = 2w
i =
Notice that white process does not specify its PDF. They can be of
Gaussian-distributed, uniform-distributed, etc.
47
2. Autoregressive Process
An autoregressive (AR) process of order M is defined as
48
Input-output relationship of random signals is:
R xx (m) = E{x(n) x(n + m)}
= E h(k1 ) w(n k1 ) h(k 2 ) w(n + m k 2 )
k1 = k2 =
= h(k1 )h(k 2 ) E{w(n k1 ) w(n + m k 2 )}
k1 = k2 =
= h(k1 )h(k 2 ) Rww (m + k1 k 2 )
k1 = k2 =
= Rww (m k ) h(k1 )h(k + k1 ), k = k 2 k1
k = k1 =
R xx (m) = Rww (m) g (m), g (k ) = h(k1 )h(k + k1 ) = h(k ) h(k )
k1 =
2
xx () = ww () G (), G () = H ()
2
xx () = ww () H () (I.41)
49
Note that (1.41) applies for all stationary input processes and impulse
responses.
In particular, for the AR process, we have
2w
xx () = (1.42)
jM 2
1 a1e j a 2 e j 2 L a M e
50
4. Autoregressive Moving Average Process
j jN 2
b0 + b1e + L + bN e
xx () = 2w (1.46)
jM 2
1 a1e j a 2 e j 2 L a M e
51
Questions for Discussion
1. Consider a signal x(n) and a stable system with transfer function
H ( z ) = B ( z ) / A( z ) . Let the system output with input x(n) be y (n) .
Can we always recover x(n) from y (n) ? Why? You may consider the
simple cases of B( z ) = 1 + 2 z 1 and A( z ) = 1 as well as
B( z ) = 1 + 0.5 z 1 and A( z ) = 1.
2. Given a random variable x with mean x and variance 2x . Determine
the mean, variance, mean square value of
y = ax + b
where a and b are finite constants.
3. Is AR process really stationary? You can answer this question by
examining the autocorrelation function of a first-order AR process, say,
x(n) = ax(n 1) + w(n)
52
Chapter 2
Simulation Techniques
References:
1
Simulation Techniques
Signal Generation
1. Deterministic Signals
It is trivial to generate deterministic signals given the synthesis formula,
e.g., for a single real tone, it is generated by
x(n) = A cos(n + ), n = 0,1, L , N 1
MATLAB code:
for n=1:N
x(n)=A*cos(w*(n-1)+p); % note that index should be > 0
end
2
An alternative approach is
x=
Columns 1 through 7
Columns 8 through 10
3
Example 2.1
Recall the simple mathematical model of a musical signal:
x(t ) = a (t ) cm cos(2mf 0t + m )
m =1
AA EE F# F# EE
DD C#C# BB AA
EE DD C# C# BB (repeat once)
4
The American Standard pitch for each of these notes is:
A: 440.00 Hz
B: 493.88 Hz
C#: 554.37 Hz
D: 587.33 Hz
E: 659.26 Hz
F#: 739.99 Hz
Assuming that each note lasts for 0.5 second and a sampling frequency of
8000 Hz, the MATLAB code for producing this piece of music is:
5
line1=[a,a,e,e,fs,fs,e,e]; % first line of song
line2=[d,d,cs,cs,b,b,a,a]; % second line of song
line3=[e,e,d,d,cs,cs,b,b]; % third line of song
song=[line1,line2,line3,line3,line1,line2]; % composite song
sound(song,8000); % play sound with 8kHz sampling frequency
wavwrite(song,'song.wav'); % save song as a wav file
Note that in order to attain better music quality (e.g., flute, violin), we
should use the more general model:
x(t ) = a (t ) cm cos(2mf 0t + m )
m =1
Q.: How many discrete-time samples in the 0.5 second note with 8000
Hz sampling frequency?
6
2. Random Signals
Uniform Variable
A uniform random sequence can be generated by
x(n) = seed n = (a seed n 1 ) mod(m), n = 1,2, L
where seed 0 , a and m are positive integers. The numbers generated
should be (approximately) uniformly distributed between 0 and (m 1) .
A set of choice for a and m which generates good uniform variables is
a = 16807 and m = 2147483647
This uniform PDF can be changed easily by scaling and shifting the
generation formula. For example, a random number which is uniformly
between 0.5 and 0.5 is given by
seed n = ( a seed n 1 ) mod(m)
seed n
x ( n) = 0.5
m
7
The power of x(n) is
0.5 0.5
var( x) = x p ( x) dx = x 2 dx
2
0.5 0.5
1
=
12
Note that x(n) is independent (white).
8
Evaluation of MATLAB uniform random numbers:
m = 0.0172
p = 2.0225
v = 2.0222
9
plot(u); % plot the signal
2.5
1.5
0.5
-0.5
-1
-1.5
-2
-2.5
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
10
hist(u,20) % plot the histogram for u
% with 20 bars
300
250
200
150
100
50
0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
11
a = xcorr(u); % compute the autocorrelation
plot(a) % plot the autocorrelation
12000
10000
8000
6000
4000
2000
-2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
12
axis([4990, 5010, -500, 12000]) % change the axis
12000
10000
8000
6000
4000
2000
0
4990 4992 4994 4996 4998 5000 5002 5004 5006 5008 5010
13
Gaussian Variable
Given a pair of independent uniform numbers which are uniformly
distributed between [0,1], say, (u1 , u 2 ) , a pair of independent Gaussian
numbers, which have zero-mean and unity variance, can be generated
from:
w1 = 2 ln(u1 ) cos(2u 2 )
w2 = 2 ln(u1 ) sin( 2u 2 )
This is known as the Box-Mueller transformation. Note that the Gaussian
numbers are white.
14
% with mean 0 and variance 1
Evaluation of MATLAB Gaussian random numbers:
m = 0.0123
p = 2.0158
v = 2.0157
15
plot(w); % plot the signal
6
-2
-4
-6
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
16
hist(w,20) % plot the histogram for w
% with 20 bars
800
700
600
500
400
300
200
100
0
-6 -4 -2 0 2 4 6
17
a = xcorr(w); % compute the autocorrelation
plot(a) % plot the autocorrelation
12000
10000
8000
6000
4000
2000
-2000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
18
axis([4990, 5010, -500, 12000]) % change the axis
12000
10000
8000
6000
4000
2000
4990 4992 4994 4996 4998 5000 5002 5004 5006 5008 5010
19
Impulsive Variable
The main feature of impulsive or impulse process is that its value can be
very large. A mathematical model for impulsive noise is called -stable
process, where 0 < 2 .
20
MATLAB code for 0 < < 2 and 1
21
plot(i);
40
20
-20
-40
-60
-80
-100
-120
-140
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
22
MATLAB code for = 1
N=5000;
phi = (rand(1,N)-0.5)*pi;
a = tan((0.5.*phi));
i = 2.*a./(1-a.^2);
plot(i)
2000
1500
1000
500
-500
-1000
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
23
PDF for different
24
The impulsiveness is due to the heavier tails, i.e., PDF go to zero slowly
25
AR, MA and ARMA Processes
MA process is generated from
26
MATLAB code for generating 50 samples of MA process with b0 = 1, b1 = 2 :
b0=1;
b1=2;
N=50;
w=randn(1,N+1); % generate N+1 white noise samples
for n=1:N
x(n) = b0*w(n+1)+b1*w(n); % shift w by one sample
end
27
From (1.44), the PSD for MA process is
j 2 j 2
xx () = 1 + 2e 2w = 1 + 2e
b0=1;
b1=2;
b= [b0 b1];
a=1;
[H,W] = freqz(b,a); % H is complex frequency response
PSD = abs(H.*H);
plot(W/pi,PSD);
28
9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
29
To evaluate the MA process generated by MATLAB, we use (1.38):
1 N 1 2
j n
xx () = lim E x ( n)e
N N n =0
N N 100
E{} average of 100 independent simulations
MATLAB code:
N=100;
b= [1 2]; % b is a vector
for m=1:100 % perform 100 independent runs
w=randn(1,N+1); % generate N+1 white noise samples
y=conv(b,w); % signal length is N+1+2-1
x=y(2:N+1); % remove the transient signals
p(m,:) = abs(fft(x).*fft(x));
end
psd = mean(p)./100;
index = 1/50:1/50:2;
plot(index,psd);
axis([0, 1, 0 10]);
30
31
N N 10000
E{} average of 10000 independent simulations
10
0
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
32
Transient signals are also needed to remove in AR & ARMA processes
because of non-stationarity due to the poles:
x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M ) + w(n)
x(n) = a1 x(n 1) + a 2 x(n 2) + L + a M x(n M )
+ b0 w(n) + b1 w(n 1) + L + b N w(n N )
33
Suppose a 2( n+1) 0.0001 is required and the AR parameter is a = 0.9 .
The required n is calculated as
M = 43;
N = 50;
a = -0.9;
y(1) = 0;
for n=2:M+N
y(n) = a*y(n-1)+randn;
end
x=y(M+1:M+N);
plot(x);
34
8
-2
-4
-6
0 5 10 15 20 25 30 35 40 45 50
35
Digital Filtering
For FIR system, we can follow the MA process, while for IIR system, we
can follow the ARMA process. The transient signals can be removed if
necessary as in the MA, AR and/or ARMA processes.
1 j n
h( n) = H ()e d
2
36
Example 2.2
Compute the impulse response for H d (z ) with the following DTFT
spectrum, and o = 0.2 and c = 0.4 .
d()
1
0.5
c o 0 o c
1 jn
hd (n) = H
d ( ) e d
2
1 c j n 1 o jn sin( c n) sin( o n)
= 0.5 e d + 0.5 e d = +
2 c 2 o 2n 2n
37
sin(0.2n) + sin(0.4n)
hd (n) = , n = L ,1,0,1, L
2n
Note that hd (0) can be obtained by using LHospitals rule or:
1 j0 1
hd (0) = H d ()e d = H d ()d
2 2
1 c 1 o c + o
= 0.5d + 0.5d = = 0.3
2 c 2 o 2
M
y (n) = hd ( n) x(n) = x( n k )hd (k ) x( n k ) hd (k )
k = k = M
38
Example 2.3
Compute the impulse response of a time-shift function which time-shifts a
signal by a non-integer delay D .
1 j n 1 jD jn 1 j( n D )
h( n) = H ()e d = e e d = e d
2 2 2
= sinc(n D)
where
sin( x)
sinc( x) =
x
y (n) = x(n) sinc(n D) = x(n k ) sinc(k D)
k =
M
x(n k ) sinc(k D)
k = M
39
Questions for Discussion
1. Observe that the following signal:
10
y (n) = x(n k ) sinc(k D ) x(n D)
k = 10
which depends on future data {x(n + 1), x (n + 2), L , x( n + 10)}.
This is referred to as a non-causal system. How to generate the output
of the non-causal system in practice?
j , 0 <
H () =
j, < 0
40
Chapter 3
Optimal Filter Theory and Applications
References:
1
Optimal Signal Processing is concerned with the design, analysis, and
implementation of processing system that extracts information from
sampled data in a manner that is best or optimal in some sense. Such
processing systems can be referred to as optimal filters.
2
Speech Modeling using Linear Predictive Coding (LPC)
Since speech signals are highly correlated, a speech signal s (n) can be
accurately modeled by a linear combination of its past samples:
P
s (n) s(n) = wi s (n i )
i =1
where {wi } are known as the LPC coefficients. Techniques of optimal
signal processing can be used to determine {wi } in an optimal way.
2. Identification
3
System Identification
no(k)
+
Unknown system d(k) +
s(k)
H(z)
r(k)
+ +
+ x(k) Optimal filter -
n i (k)
W(z)
e(k)
Given noisy input x(k ) and/or noisy output r (k ) , our aim is to determine
the impulse response of the unknown system H (z ) using W (z )
4
3. Inverse Filtering: find the inverse of the system
Signal Recovery
5
4. Interference Canceling: Remove noise using an external reference
6
Problem Statement for Optimal Filters
Estimation
Error
e(n)
Given the input x(n) and the desired response d (n) , we want to find the
transfer function W ( z ) or its impulse response such that a statistical
criterion or a cost function is optimized.
7
Some common optimization criteria in the literature are:
N 1
1.Least Squares : find W (z ) that minimizes e 2 (n) where N is the
n =0
number of samples available. This corresponds to least-squares filter
design.
2.Minimum Mean Square Error : find W ( z ) that minimizes E{e 2 (n)}. This
corresponds to Wiener filtering problem.
N 1
3.Least Absolute Sum : find W (z ) that minimizes e(n) .
n=0
4.Minimum Mean Absolute Error : find W (z ) that minimizes E{ e(n) }.
5.Least Mean Fourth : find W (z ) that minimizes E{e 4 (n)}.
The first and second are two commonly used criteria because of their
relatively small computation, ease of analysis and robust performance. In
later sections it is shown that both viewpoints give rise to similar
mathematical expression for W (z ) .
Q.: An absolute optimization criterion does not exist? Why?
8
Least Squares Filtering
L 1
W ( z ) = wi z i (3.1)
i =0
e( n ) = d ( n ) y ( n ) (3.2)
where
L 1
y (n) = wi x(n i ) = W T X (n) ,
i =0
W = [ w0 w1 L wL 2 wL 1 ]T
X (n) = [ x(n) x(n 1) L x( n L + 2) x(n L + 1)]T
9
The cost function is
N 1 N 1 L 1 2
J LS (W ) = e 2 (n) = d (n) wi x(n i ) (3.3)
n=0 n=0 i =0
which is a function of the filter coefficients {wi } and N is the number of
x(n) (and d (n) ).
The minimum of the least squares function can be found by differentiating
J LS (W ) with respect to w0 , w1 , L , wL 1 and then setting the resultant
expressions to zero as follows,
J LS (W ) N 1 L 1
2
= d (n) wi x(n i ) = 0, j = 0,1, L , L 1
w j w j n =0 i =0
N 1 L 1
2 d (n) wi x(n i ) ( x(n j ) ) = 0 (3.4)
n =0 i =0
N 1 N 1 L 1 L 1 N 1
d (n) x(n j ) = wi x(n i ) x(n j ) = wi x(n i ) x(n j )
n =0 n =0 i =0 i =0 n =0
10
Denote
R dx = [ R dx (0) R dx (1) L R dx ( L + 2) R dx ( L + 1)]T (3.5)
where
N 1
R dx ( j ) = d (n) x(n j ), j = 0,1, L , L 1
n =0
and
R xx (0,0) R xx (1,0) L R xx ( L + 2,0) R xx ( L + 1,0)
R xx ( 0 , 1) O
R xx = L O M
R
xx ( 0 , L + 2 ) O
R xx (0, L + 1) R xx (1, L + 1) R xx ( L + 1, L + 1)
where
N 1
R xx (i, j ) = x(n i ) x(n j )
n=0
11
In practice, for stationary signals, we use
R xx (0) R xx (1) L R xx ( L + 2) R xx ( L + 1)
R xx (1) O
R xx = L O M (3.6)
R xx ( L 2) O
R xx ( L 1) R xx ( L 2) R xx (0)
where
N 1 i
R xx (i ) = x(n) x(n + i ) = R xx (i )
n=0
As a result, we have
R dx = R xx W LS
(3.7)
W LS = (R xx ) R dx
1
12
Example 3.1
Unknown System
x(n) 4
hi z -i
i=0
+
q(n)
+
d(n) +
4 y(n) e(n)
wi z -i -
i=0
13
We can use MATLAB to simulate the least squares filter for impulse
response estimation. The MATLAB source code is as follows,
%define the number of samples
N=50;
%define the noise and signal powers
noise_power = 0.0;
signal_power = 5.0;
%define the unknown system impulse response
h=[1 2 3 2 1];
%generate the input signal which is a Gaussian white noise with power 5
x=sqrt(signal_power).*randn(1,N);
14
%generate R_xx
corr_xx=xcorr(x);
for i=0:4
for j=0:4
R_xx(i+1,j+1)= corr_xx(N+i-j);
end
end
%generate the desired output plus noise
d=conv(x,h);
d=d(1:N)+sqrt(noise_power).*randn(1,N);
%generate R_dx
corr_xd = xcorr(d,x);
for i=0:4
R_dx(i+1) = corr_xd(N-i);
end
%compute the estimate channel response
W_ls = inv(R_xx)*(R_dx)'
15
251.6413 24.4044 36.4998 11.8182 4.5115
24.4044 251.6413 24.4044 36.4998 11.8182
R xx = 36.4998 24.4044 251.6413 24.4044 36.4998
11.8182 36.4998 24 .4044 251 . 6413 24 . 4044
4.5115 11.8182 36.4998 24.4044 251.6413
R dx = [390.8245 658.8827 873.3125 616.0732 334.9508]
16
When N = 500 and the noise power is 0.5 (SNR=10 dB), we have
When N = 500 and the noise power is 5.0 (SNR=0 dB), we have
It is observed that
17
Example 3.2
Find the least squares filter of the following one-step predictor system:
s(n)
d(n)
+
z -1 b0+ b1 z-1 -
x(n) e(n)
where
s (n) = 2 sin( 2n / 12)
2(n 1)
x(n) = s (n 1) = 2 sin
12
d ( n) = s ( n)
18
Given d (n) , the aim is to find b0 and b1 in least squares sense.
19
corr_xd = xcorr(d,x,unbiased);
for i=0:1
R_dx(i+1) = corr_xd(N-1-i);
end
W_ls = inv(R_xx)*(R_dx)'
The result is: W_ls = [1.7705 -1.0440]
= 3s (n 1) + (1) s (n 2)
20
Wiener Filtering
J MMSE (W ) L 1
2
= E d (n) wi x(n i ) = 0, j = 0,1, L , L 1
w j w j i =0
L 1
2 E d (n) wi x(n i ) ( x(n j ) ) = 0 (3.9)
i =0
L 1 L 1
E{d (n) x(n j )} = E wi x(n i ) x(n j ) = wi E{x(n i ) x(n j )}
i =0 i =0
21
Assume d (n) and x(n) are jointly stationary, we have
L 1
Rdx ( j ) = wi R xx (i j ), j = 0,1, L , L 1 (3.10)
i =0
Define
R dx = [ Rdx (0) Rdx ( 1) L Rdx ( L + 2) Rdx ( L + 1)]T (3.11)
and
R xx (0) R xx (1) L R xx ( L 2) R xx ( L 1)
R (1) O R xx ( L 2)
xx
R xx = M O M (3.12)
R ( L 2) O R (1)
xx xx
R xx ( L 1) R xx ( L 2) L R xx (1) R xx (0)
As a result,
R dx = R xx W MMSE
(3.13)
W MMSE = (R xx ) R dx 1
22
Relationship between Least Squares Filter & Wiener Filter
When the number of samples N and if ergodicity holds, i.e.,
1 1 N 1
lim { Rdx ( j )} = lim { d (n) x(n j )} = Rdx ( j ) (3.14)
N N N N n =0
and
1 1 N 1
lim { R xx (i, j )} = lim { x(n i ) x(n j )}
N N N N n = 0 (3.15)
= R xx (i j ) = R xx ( j i )
W MMSE = W LS (3.16)
23
Properties of the Mean Square Error (MSE) Function
The MSE function E{e 2 (n)}is also known as performance surface and it
can be written in a matrix form:
( )
2
2 L 1 T 2
E{e (n)} = E d (n) wi x(n i ) = E d ( n) W X (n)
i =0
{ } {(
2 T
)} T
= E d (n) 2 E W X ( n) d (n) + E W X ( n) W X (n)
T
( T
) (3.17)
= E {d 2 (n)} 2W T E{( X (n)d (n) )} + W T E {(X ( n) X ( n)T )}W
= E {d 2 (n)} 2W T R dx + W T R xx W
24
An example for L = 2 is shown below:
25
2.The minimum of E{e 2 ( n)} is obtained by substituting W = W MMSE :
{ }
min = E d 2 (n) 2W MMSE T R dx + W MMSE T R xx W MMSE
26
Example 3.3
x(n)
z -1
w0 w1
- d(n)
-
+
where
2 2 1 7
E{d (n)} = 42, R xx = , R dx =
1 2 8
27
The performance surface is calculated as
{ }
E{e 2 (n)} = E d 2 (n) 2W T R dx + W T R xx W
w0 2 1 w0
= 42 2[7 8] + [w0 w1 ] w
w1 1 2 1
= 2 w02 + 2 w12 + 2 w0 w1 14 w0 16 w1 + 42
While the Wiener filter weight is
1
2 1 7 2
W MMSE = =
1 2 8 3
Notice that the inverse of any nonsingular two-by-two matrix is
1
a b 1 d b
c d = ad bc c a
In practice, when the E{d 2 (n)} , R xx and R dx are not available, we can
estimate them from x(n) and d (n) using least squares filtering method.
The resultant filter coefficients are least squares filter coefficients.
28
Example 3.4
Find the Wiener filter of the following system:
s(n)
d(n)
+
z -1 b0+ b1 z-1 -
x(n) e(n)
where s (n) = 2 sin( 2n / 12) .
2(n 1) 2n
It can be seen that x(n) = 2 sin and d ( n ) = s ( n ) = 2 sin .
12 12
The required statistics R xx (0) , R xx (1) , Rdx (0) and Rdx (1) are computed as
follows.
29
Using
2 1 cos(2 A)
sin ( A) =
2
and
1
sin( A) sin( B) = (cos( A B ) cos( A + B ) )
2
then
2(n 1) 2(n 1)
R xx (0) = E 2 sin 2 sin
12 12
1 cos(2(2(n 1)) )
= 2E
2
= 1 + E{cos(4(n 1) )} = 1
30
2(n 1) 2n
R xx (1) = E 2 sin 2 sin
12 12
2(n 1) 2n 2(n 1) 2n
= E cos cos +
12 12 12 12
2 3
= cos =
12 2
2(n 1) 2n 3
Rdx (0) = E 2 sin 2 sin =
12 12 2
2 ( n 2 ) 2n 4 1
Rdx (1) = E 2 sin 2 sin = cos =
12 12 12 2
As a result,
1
3 3
b~0 1 2
~ = 2 = 3
1
b1 3 1 1
2 2
31
The performance surface is given by
{ }
E{e 2 (n)} = E d 2 (n) 2W T R dx + W T R xx W
3
3 1 b0 1 2 b0
= 1 2 + [b0 b1 ] b
2 2 b1 3 1 1
2
= b02 + b12 + 3b0 b1 3b0 b1 + 1
Notice that E{d 2 (n)} = E{x 2 (n)} = 1. Moreover, the minimum MSE is
computed as
{ }
min = E d 2 (n) R Tdx W MMSE
3 1 3 3 1
= 1 = 1 + =0
2 2 1 2 2
This means that the optimal predictor is able to shift the phase of the
delayed sine wave and achieve exact cancellation, resulting in min = 0
32
Questions for Discussion
1. A real sinusoid s (n) = A cos(n + ) obeys
s (n) = a1`s (n 1) + a 2 s (n 2)
2. Can we extend the least squares filter or Wiener filter to the general IIR
system model? Try to answer this question by investigating the Wiener
filter using a simple IIR model:
b0
W ( z) =
1 a1 z 1
That is, given d (k ) and x(k ) . What are the optimal b0 and a1 in mean
square error sense? Assume that x(k ) is white for simplicity.
33
Input Output Desired
y(k) Response
Signal
d(k)
x(k) W(z)
- +
Estimation
Error
e(k)
Steps:
(i) develop e(k )
(ii) compute E{e 2 (k )} in terms of R xx and Rdx only.
(iii) differentiate E{e 2 (k )} w.r.t. b0 and a1
34
3. Suppose you have a ECG signal corrupted by 50Hz interference:
35
Chapter 4
Adaptive Filter Theory and Applications
References:
B.Widrow and M.E.Hoff, Adaptive switching circuits, Proc. Of
WESCON Conv. Rec., part 4, pp.96-140, 1960
B.Widrow and S.D.Stearns, Adaptive Signal Processing, Prentice-Hall,
1985
O.Macchi, Adaptive Processing: The Least Mean Squares Approach
with Applications in Transmission, Wiley, 1995
P.M.Clarkson, Optimal and Adaptive Signal Processing, CRC Press,
1993
S.Haykin, Adaptive Filter Theory, Prentice-Hall, 2002
D.F.Marshall, W.K.Jenkins and J.J.Murphy, "The use of orthogonal
transforms for improving performance of adaptive filters", IEEE Trans.
Circuits & Systems, vol.36, April 1989, pp.474-483
1
Adaptive Signal Processing is concerned with the design, analysis, and
implementation of systems whose structure changes in response to the
incoming data.
Application areas are similar to those of optimal signal processing but now
the environment is changing, the signals are nonstationary and/or the
parameters to be estimated are time-varying. For example,
2
Adaptive Filter Development
3
Adaptive Filter Definition
An adaptive filter is a time-variant filter whose coefficients are adjusted in
a way to optimize a cost function or to satisfy some predetermined
optimization criterion.
Characteristics of adaptive filters:
They can automatically adapt (self-optimize) in the face of changing
environments and changing system requirements
They can be trained to perform specific filtering and decision-making
tasks according to some updating equations (training rules)
Why adaptive?
It can automatically operate in
changing environments (e.g. signal detection in wireless channel)
nonstationary signal/noise conditions (e.g. LPC of a speech signal)
time-varying parameter estimation (e.g. position tracking of a moving
source)
4
Block diagram of a typical adaptive filter is shown below:
Adaptive y(k)
x(k) d(k)
Filter - +
{h(k)}
Adaptive e(k)
Algorithm
x(k) : input signal y(k) : filtered output
d(k) : desired response
h(k) : impulse response of adaptive filter
2 N-1 2
The cost function may be E{e (k)} or e (k)
k=0
5
Basic Classes of Adaptive Filtering Applications
6
2.Identification : adaptive control, layered earth modeling, vibration
studies of mechanical system
7
3.Inverse Filtering : adaptive equalization for communication channel,
deconvolution
8
4.Interference Canceling : adaptive noise canceling, echo cancellation
9
Design Considerations
1. Cost Function
choice of cost functions depends on the approach used and the
application of interest
some commonly used cost functions are
N 1
exponentially weighted least squares criterion : minimizes N 1 k e 2 (k )
k =0
where N is the total number of samples and denotes the exponentially
weighting factor whose value is positive close to 1.
10
2. Algorithm
depends on the cost function used
(This is a performance measure for algorithms that use the minimum MSE
criterion)
11
tracking capability : This refers to the ability of the algorithm to track
statistical variations in a nonstationary environment.
computational requirement : number of operations, memory size,
investment required to program the algorithm on a computer.
robustness : This refers to the ability of the algorithm to operate
satisfactorily with ill-conditioned data, e.g. very noisy environment,
change in signal and/or noise models
3. Structure
structure and algorithm are inter-related, choice of structures is based on
quantization errors, ease of implementation, computational complexity,
etc.
four commonly used structures are direct form, cascade form, parallel
form, and lattice structure. Advantages of lattice structures include
simple test for filter stability, modular structure and low sensitivity to
quantization effects.
12
e.g.,
B2 z 2 C1 z C2 z D1 z + E1 D2 z + E 2
H ( z) = = = +
z + A1 z + A0 ( z p1 ) ( z p 2 )
2 z p1 z p2
13
Commonly Used Methods for Minimizing MSE
For simplicity, it is assumed that the adaptive filter is of causal FIR type
and is implemented in direct form. Therefore, its system block diagram is
x(n) z -1 z -1 ... z -1
+ ... + +
y(n) -
d(n)
e(n) +
+
14
The error signal at time n is given by
e( n ) = d ( n ) y ( n ) (4.2)
where
L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0
W (n) = [ w0 (n) w1 (n) L wL 2 (n) wL 1 (n)]T
X (n) = [ x(n) x(n 1) L x(n L + 2) x(n L + 1)]T
Recall that minimizing the E{e 2 (n)}will give the Wiener solution in optimal
filtering, it is desired that
lim W ( n) = W MMSE = (R xx )1 R dx (4.3)
n
15
Two common gradient searching approaches for obtaining the Wiener
filter are
1. Newton Method
1 E{e (n)}
2
W (n) = R xx
(4.5)
W ( n )
where is called the step size. It is a positive number that controls the
convergence rate and stability of the algorithm. The adaptive algorithm
becomes
E{e 2 (n)}
W (n + 1) = W (n) R xx1
W (n)
= W (n) R xx1 2(R xx W (n) R dx ) (4.6)
= (1 2)W (n) + 2 R xx1 R dx
= (1 2)W (n) + 2W MMSE
16
Solving the equation, we have
The weights jump form any initial W (0) to the optimum setting W MMSE in a
single step.
17
An example of the Newton method with = 0.5 and 2 weights is illustrated
below.
18
2. Steepest Descent Method
E{e 2 (n)}
W (n) =
(4.11)
W ( n )
Thus
E{e 2 (n)}
W (n + 1) = W (n)
W (n)
= W (n) 2(R xx W (n) R dx ) (4.12)
= ( I 2 R xx )W (n) + 2 R xx W MMSE
= ( I 2 R xx )(W (n) W MMSE ) + W MMSE
19
Using the fact that R xx is symmetric and real, it can be shown that
R xx = Q Q 1 = Q Q T (4.15)
1 0 L 0
0
2
=M O M (4.16)
0
0 L 0 L
20
It can be proved that the eigenvalues of R xx are all real and greater or
equal to zero. Using these results and let
V ( n) = Q U ( n) (4.17)
We have
Q U (n + 1) = ( I 2 R xx )Q U (n)
U (n + 1) = Q 1 ( I 2 R xx )Q U (n)
(4.18)
= Q( 1
I Q 2Q 1
)
R xx Q U (n)
= ( I 2 )U (n)
The solution is
U (n) = ( I 2 )n U (0) (4.19)
where U (0) is the initial value of U (n) . Thus the steepest descent
algorithm is stable and convergent if
lim ( I 2 )n = 0
n
21
or
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0 (4.20)
0
0 L 0 lim (1 2 L )
n
n
which implies
1
| 1 2 max |< 1 0 < < (4.21)
max
where max is the largest eigenvalue of R xx .
If this condition is satisfied, it follows that
lim U (n) = 0 lim Q 1 V (n) = lim Q 1 (W (n) W MMSE ) 0
n n n
(4.22)
lim W (n) = W MMSE
n
22
An illustration of the steepest descent method with two weights and
= 0.3 is given as below.
23
Remarks:
24
Widrows Least Mean Square (LMS) Algorithm
A. Optimization Criterion
B. Adaptation Procedure
25
The LMS algorithm is therefore:
e 2 (n)
W (n + 1) = W (n)
W (n)
e 2 (n) e(n)
= W ( n)
e(n) W (n)
= W (n) 2e(n)
[
d ( n) W T ( n) X ( n)],
AT B
=B
W (n) A
= W (n) + 2e(n) X (n)
or
26
C. Advantages
D. Performance Surface
27
E. Performance Analysis
Two important performance measures in LMS algorithms are rate of
convergence & misadjustment (relates to steady state filter weight
variance).
1. Convergence Analysis
For ease of analysis, it is assumed that W (n) is independent of X (n) .
Taking expectation on both sides of the LMS algorithm, we have
28
Following the previous derivation, W (n) will converge to the Wiener filter
weights in the mean sense if
lim (1 21 )n 0 L 0
n
0 lim (1 2 2 )n M
n
M O =0
0
0 L 0 lim (1 2 L )
n
n
| 1 2 i |< 1, i = 1,2, L , L
1
0 < < (4.26)
max
Define geometric ratio of the p th term as
r p = 1 2 p , p = 1,2, L , L (4.27)
It is observed that each term in the main diagonal forms a geometric
series {1, r p1 , r p2 , L , r pn 1 , r pn , r pn +1 , L}.
29
Exponential function can be fitted to approximate each geometric series:
1 n
r p exp
{ }
r p exp
n
(4.28)
p
p
where p is called the p th time constant .
For slow adaptation, i.e., 2 p << 1, p is approximated as
1 ( 2 p ) ( 2 p )
2 3
= ln(1 2 p ) = 2 p + L 2 p
p 2 3
(4.29)
1
p
2 p
Notice that the smaller the time constant the faster the convergence rate.
Moreover, the overall convergence is limited by the slowest mode of
convergence which in turns stems from the smallest eigenvalue of R xx ,
min .
30
That is,
1
max (4.30)
2 min
the step size : the larger the , the faster the convergence rate
the eigenvalue spread of R xx , ( R xx ) : the smaller ( R xx ) , the faster the
convergence rate. ( R xx ) is defined as
max
( R xx ) = (4.31)
min
31
Example 4.1
An Illustration of eigenvalue spread for LMS algorithm is shown as follows.
Unknown System
x(n) 1
hi z -i
i=0
+
q(n)
+
d(n) +
1 y(n) e(n)
wi z -i -
i=0
32
; file name is es.m
clear all
N=1000; % number of sample is 1000
np = 0.01; % noise power is 0.01
sp = 1; % signal power is 1 which implies SNR = 20dB
h=[1 2]; % unknown impulse response
x = sqrt(sp).*randn(1,N);
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
33
for n=2:N % the LMS algorithm
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0) % plot filter weight estimate versus time
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e); % plot square error versus time
34
35
36
Note that both filter weights converge at similar speed because the
eigenvalues of the R xx are identical:
Recall
R xx (0) R xx (1)
R xx =
R
xx (1) R xx ( 0)
R xx (0) = E{x(n).x(n)} = 1
R xx (1) = E{x(n).x(n 1)} = 0
As a result,
Rxx (0) Rxx (1) 1 0
R xx = =
R
xx (1) R xx ( 0) 0 1
( R xx ) = 1
37
; file name is es1.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/2).*randn(1,N+1);
x = u(1:N) + u(2:N+1); % x(n) is now a MA process with power 1
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
w0(1) = 0;
w1(1) = 0;
mu = 0.005;
y(1) = w0(1)*x(1);
e(1) = d(1) - y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);
38
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.2])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.2])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);
39
40
41
Note that the convergence speed of w0 (n) is slower than that of w1 (n)
Investigating the R xx :
R xx (0) = E{x(n).x(n)}
= E{(u (n) + u (n 1)) (u (n) + u (n 1))}
= E{u 2 (n) + u 2 (n 1)}
= 0.5 + 0.5 = 1
Rxx (1) = E{x(n).x(n 1)}
= E{(u (n) + u (n 1)) (u (n 1) + u (n 2))}
= E{u 2 (n 1)} = 0.5
As a result,
Rxx (0) Rxx (1) 1 0.5
Rxx = =
R xx (1) Rxx (0) 0.5 1
42
; file name is es2.m
clear all
N=1000;
np = 0.01;
sp = 1;
h=[1 2];
u = sqrt(sp/5).*randn(1,N+4);
x = u(1:N) + u(2:N+1) + u(3:N+2) + u(4:N+3) +u(5:N+4); % x(n) is 5th order MA process
d = conv(x,h);
d = d(1:N) + sqrt(np).*randn(1,N);
w0(1) = 0;
w1(1) = 0;
mu = 0.005;
y(1) = w0(1)*x(1);
e(1) = d(1) - y(1);
w0(2) = w0(1) + 2*mu*e(1)*x(1);
w1(2) = w1(1);
43
for n=2:N
y(n) = w0(n)*x(n) + w1(n)*x(n-1);
e(n) = d(n) - y(n);
w0(n+1) = w0(n) + 2*mu*e(n)*x(n);
w1(n+1) = w1(n) + 2*mu*e(n)*x(n-1);
end
n = 1:N+1;
subplot(2,1,1)
plot(n,w0)
axis([1 1000 0 1.5])
subplot(2,1,2)
plot(n,w1)
axis([1 1000 0 2.5])
figure(2)
subplot(1,1,1)
n = 1:N;
semilogy(n,e.*e);
44
45
46
We see that the convergence speeds of both weights are very slow,
although that of w1 (n) is faster.
Investigating the R xx :
47
2. Misadjustment
Upon convergence, if lim W (n) = W MMSE , then the minimum MSE will be
n
equal to
{ }
min = E d 2 (n) R Tdx W MMSE (4.32)
However, this will not occur in practice due to random noise in the weight
vector W (n) . Notice that we have lim E{W (n)} = W MMSE but not
n
lim W (n) = W MMSE . The MSE of the LMS algorithm is computed as
n
2 ( T 2
E{e (n)} = E d (n) W (n) X (n)
)
{ ( ) }
= min + E (W (n) W MMSE )T X (n) X (n)T (W (n) W MMSE ) (4.33)
= min + E {(W (n) W MMSE )T
R xx (W (n) W MMSE )}
48
The second term of the right hand side at n is known as the excess
MSE and it is given by
{
excess MSE = lim E (W ( n) W MMSE )T R xx (W (n) W MMSE )
n
}
= lim E {V (n) T
R xx V (n) }
n
= lim E {U (n) }
T (4.34)
U ( n)
n
L 1
= min i = min tr [R xx ]
i =0
49
As a result, the misadjustment M is given by
50
2.The bound for is
1
0<< (4.37)
max
In practice, the signal power of x(n) can generally be estimated more
easily than the eigenvalue of R xx . We also note that
L
max i = tr [R xx ] = L E{x 2 (n)} (4.38)
i =1
51
LMS Variants
1. Normalized LMS (NLMS) algorithm
the product vector e(n) X (n) is modified with respect to the squared
Euclidean norm of the tap-input vector X (n) :
2
W (n + 1) = W (n) + T
e( n ) X ( n ) (4.40)
c + X ( n) X ( n)
where c is a small positive constant to avoid division by zero.
can also be considered as an LMS algorithm with a time-varying step
size:
(n ) = T
(4.41)
c + X ( n) X ( n)
substituting c = 0 , it can be shown that the NLMS algorithm converges if
0 < < 0.5 selection of step size in the NLMS is much easier than
that of LMS algorithm
52
2. Sign algorithms
53
3. Leaky LMS algorithm
the LMS update is modified by the presence of a constant leakage factor
:
54
Application Examples
Example 4.2
1. Linear Prediction
Suppose a signal x(n) is a second-order autoregressive (AR) process that
satisfies the following difference equation:
x( n) = 1.558 x(n 1) 0.81x(n 2) + v(n)
where v(n) is a white noise process such that
v2 , m=0
Rvv (m) = E{v(n)v(n + m)} =
0, otherwise
2
x (n) = wi (n) x(n i ) = w1 (n) x(n 1) + w2 (n) x(n 2)
i =1
55
Upon convergence, we desire
E{w1 (n)} 1.558
and
E{w2 (n)} 0.81
x(n) z -1 z -1
w1(n) w2(n)
+
+
+
x(n) -
d(n)
+ e(n)
+
56
The error function or prediction error e(n) is given by
2
e(n) = d (n) wi (n) x(n i )
i =1
= x(n) w1 (n) x(n 1) w2 (n) x(n 2)
Thus the LMS algorithm for this problem is
e 2 (n) e 2 (n) e(n)
w1 (n + 1) = w1 (n) = w1 (n)
2 w1 (n) 2 e(n) w1 (n)
= w1 (n) + e(n) x(n 1)
and
e 2 (n)
w2 (n + 1) = w2 (n)
2 w2 (n)
= w2 (n) + e(n) x(n 2)
The computational requirement for each sampling interval is
multiplications : 5
addition/subtraction : 4
57
Two values of , 0.02 and 0.004, are investigated:
58
Convergence characteristics for the LMS predictor with = 0.004
59
Observations:
60
Example 4.3
2. System Identification
Given the input signal x(n) and output signal d (n) , we can estimate the
impulse response of the system or plant using the LMS algorithm.
2
Suppose the transfer function of the plant is hi z i which is a causal FIR
i =0
unknown system, then d (n) can be represented as
2
d (n) = hi x(n i )
i =0
61
Assuming that the order the transfer function is unknown and we use a 2-
coefficient LMS filter to model the system function as follows,
Plant
x(n) 2 d(n)
hi z -i
i=0
+
e(n)
-
1 y(n)
wi z -i
i=0
62
Thus the LMS algorithm for this problem is
The learning behaviours of the filter weights w0 (n) and w1 (n) can be
obtained by taking expectation on the LMS algorithm. To simplify the
analysis, we assume that x(n) is a stationary white noise process such
that
2x , m=0
R xx (m) = E{x(n) x(n + m)} =
0, otherwise
63
Assume the filter weights are independent of x(n) and apply expectation
on the first updating rule gives
64
E{w0 (n + 1)} = E{w0 (n)}(1 2x ) + h0 2x
E{w0 (n)} = E{w0 (n 1)}(1 2x ) + h0 2x
E{w0 (n 1)} = E{w0 (n 2)}(1 2x ) + h0 2x
LLLLLLLLLLLLLLLLLLLL
E{w0 (1)} = E{w0 (0)}(1 2x ) + h0 2x
Multiplying the second equation by (1 2x ) on both sides, the third
equation by (1 2x ) 2 , etc., and summing all the resultant equations, we
have
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x (1 + (1 2x ) + L + (1 2x ) n )
1 (1 2x ) n +1
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 2x
1 (1 2x )
E{w0 (n + 1)} = E{w0 (0)}(1 2x ) n +1 + h0 (1 (1 2x ) n +1 )
E{w0 (n + 1)} = ( E{w0 (0)} h0 )(1 2x ) n +1 + h0
65
Hence
lim E{w0 (n)} = h0
n
provided that
2
| 1 2x |< 1 1 < 1 2x <1 0 < <
2x
Similarly, we can show that the expected value of E{w1 (n)} is
It is worthy to note that the choice of the initial filter weights E{w0 (0)} and
E{w1 (0)} do not affect the convergence of the LMS algorithm because the
performance surface is unimodal.
66
Discussion:
Since the LMS filter consists of two weights but the actual transfer function
comprises three coefficients. The plant cannot be exactly modeled in this
case. This refers to under-modeling. If we use a 3-weight LMS filter with
2
transfer function wi z i , then the plant can be modeled exactly. If we use
i =0
more than 3 coefficients in the LMS filter, we still estimate the transfer
function accurately. However, in this case, the misadjustment will be
increased with the filter length used.
Notice that we can also use the Wiener filter to find the impulse response
of the plant if the signal statistics, R xx (0) , R xx (1) , Rdx (0) and Rdx (1) are
available. However, we do not have Rdx (0) and Rdx (1) although
R xx (0) = 2x and R xx (1) =0 are known. Therefore, the LMS adaptive filter can
be considered as an adaptive realization of the Wiener filter and it is used
when the signal statistics are not (completely) known.
67
Example 4.4
3. Interference Cancellation
Given a received signal r (k ) which consist of a source signal s (k ) and a
sinusoidal interference with known frequency. The task is to extract s (k )
from r (k ) . Notice that the amplitude and phase of the sinusoid is unknown.
A well-known application is to remove 50/60 Hz power line interference in
the recording of the electrocardiogram (ECG).
Source Signal + Sinusoidal Interference
r(k)=s(k) + Acos( 0k + )
Reference Signal +
sin(0k) - e(k)
b0
-
cos(0k)
90 0 Phase-Shift b1
68
The interference cancellation system consists of a 90 0 phase-shifter and a
two-weight adaptive filter. By properly adjusting the weights, the reference
waveform can be changed in magnitude and phase in any way to model
the interfering sinusoid. The filtered output is of the form
e 2 (k ) e 2 (k ) e(k )
b0 (k + 1) = b0 (k ) = b0 (k )
2 b0 (k ) 2 e(k ) b0 (k )
= w0 (k ) + e(k ) sin( 0 k )
and
e 2 (k )
b1 (k + 1) = b1 (k )
2 b1 (k )
= b1 (k ) + e(k ) cos( 0 k )
69
Taking the expected value of b0 (k ) , we have
E{b0 (k + 1)}
= E{b0 (k )} + E{e(k ) sin( 0 k )}
= E{b0 (k )} + E{[ s (k ) + A cos( 0 k + ) b0 (k ) sin( 0 k ) b1 (k ) cos( 0 k )] sin( 0 k )}
= E{b0 (k )} + E{[ A cos( 0 k ) cos() A sin( 0 k ) sin() b0 (k ) sin( 0 k )
b1 (k ) cos( 0 k )] sin( 0 k )}
1
2
{
= E{b0 (k )} + E ( A cos() b1 (k ) ) sin( 2 0 k ) E ( A sin() + b0 (k ) ) sin 2 ( 0 k )
}
1 cos(2 0 k )
= E{b0 (k )} E ( A sin() + b0 (k ) )
2
= E{b0 (k )} A sin() E{b0 (k )}
2 2
A
= 1 E{b0 (k )} sin()
2 2
70
Following the derivation in Example 4.3, provided that 0 < < 4 , the
learning curve of E{b0 (k )} can be obtained as
k
E{b0 (k )} = A sin() + ( E{b0 (0)} + A sin() ) 1
2
k
( )
E{b1 (k )} = A cos() + E{b1 (0)} A cos() 1
2
When k , we have
71
The filtered output is then approximated as
e(k ) r (k ) + A sin() sin( 0 k ) A cos() cos( 0 k )
= s(k )
which means that s (k ) can be recovered accurately upon convergence.
Suppose E{b0 (0)} = E{b1 (0)} = 0 , = 0.02 and we want to find the number
of iterations required for E{b1 (k )} to reach 90% of its steady state value.
Let the required number of iterations be k 0 and it can be calculated from
k0
E{b1 (k )} = 0.9 A cos() = A cos() A cos()1
2
k
0.02 0
1 = 0.1
2
log(0.1)
k0 = = 229.1
log(0.99)
Hence 300 iterations are required.
72
If we use Wiener filter with filter weights b0 and b1, the mean square error
function can be computed as
( )
2
1 A
E{e 2 (k )} = b02 + b12 + A sin() b0 A cos() b1 + E{s 2 (k )} +
2 2
E{e 2 (k )} ~
= b0 + A sin() = 0 b0 = A sin()
b0
and
E{e 2 (k )} ~
= b1 A cos() = 0 b1 = A cos()
b1
73
Example 4.5
74
Taking the Fourier transform of s D (k ) = s (k D) yields
S () = e jD S ()
1 jnD jn
h( n) = e e d
2
1 j ( n D )
= e d
2
= sinc(n D)
where
sin( v)
sinc(v) =
v
75
As a result, s (k D) can be represented as
s ( k D ) = s ( k ) h( k )
= s (k i )sinc(i D )
i =
P
s (k i )sinc(i D )
i = P
for sufficiently large P .
This means that we can use a non-casual FIR filter to model the time
delay and it has the form:
P
W ( z ) = wi z i
i = P
It can be shown that wi sinc(i D ) for i = P, P + 1, L , P using the
minimum mean square error approach. The time delay can be estimated
from {wi } using the following interpolation:
P
D = arg max wi sinc(i t )
t i = P
76
P
r1(k) W(z) = wi z -i i
i=-P
e(k)
+
r2(k)
P
e(k ) = r2 (k ) r1 (k i ) wi (k )
i = P
77
The LMS algorithm for the time delay estimation problem is thus
e 2 (k )
w j (k + 1) = w j (k )
w j (k )
e 2 (k ) e(k )
= w j (k )
e(k ) w j (k )
= w j (k ) + 2e(k )r1 (k j ), j = P, P + 1, L , P
P
D (k ) = arg max wi (k)sinc(i t )
t i = P
78
Exponentially Weighted Recursive Least-Squares
A. Optimization Criterion
n
To minimize the weighted sum of squares J (n) = n l e 2 (l ) for each time
l =0
n where is a weighting factor such that 0 < 1.
When = 1, the optimization criterion is identical to that of least squaring
filtering and this value of should not be used in a changing environment
because all squared errors (current value and past values) have the same
weighting factor of 1.
To smooth out the effect of the old samples, should be chosen less than
1 for operating in nonstationary conditions.
B. Derivation
Assume FIR filter for simplicity. Following the derivation of the least
squares filter, we differentiate J (n) with respect to the filter weight vector
at time n , i.e., W (n) , and then set the L resultant equations to zero.
79
By so doing, we have
R ( n) W ( n) = G ( n) (4.47)
where
n
R(n) = n l X (l ) X (l )T
l =0
n
G (n) = n l d (l ) X (l )
l =0
n 1
R(n) = n l X (l ) X (l )T + X (n) X (n)T = R(n 1) + X (n) X (n)T (4.48)
l =0
n 1
G (n) = n l d (l ) X (l ) + d (n) X (n) = G (n 1) + d (n) X (n) (4.49)
l =0
80
Using the well-known matrix inversion lemma:
If
A = B + C CT (4.50)
A 1 = B 1 B 1 C (1 + C T B 1 C ) 1 C T B 1 (4.51)
1 1
1 R ( n 1) X ( n ) X ( n ) T
R ( n 1)
R ( n ) 1 = R (n 1) 1 (4.52)
T
+ X (n) R (n 1) X (n) 1
81
The filter weight W (n) is calculated as
82
As a result, the exponentially weighted recursive least squares (RLS)
algorithm is summarized as follows,
1. Initialize W (0) and R (0) 1
2. For n = 1,2, L, compute
1
( n) = (4.54)
T 1
+ X (n) R (n 1) X ( n)
R ( n ) 1 =
1
[
R (n 1) 1 (n) R (n 1) 1 X (n) X (n)T R (n 1) 1 ] (4.56)
83
Remarks:
1.When = 1, the algorithm reduces to the standard RLS algorithm that
n
minimizes e 2 (l ) .
l =0
2.For nonstationary data, 0.95 < < 0.9995 has been suggested.
3.Simple choices of W (0) and R (0) 1 are 0 and 2 I , respectively, where
2 is a small positive constant.
1. Computational Complexity
RLS is more computationally expensive than the LMS. Assume there are
L filter taps, LMS requires ( 4 L + 1) additions and ( 4 L + 3 ) multiplications
per update while the exponentially weighted RLS needs a total of
( 3L2 + L 1) additions/subtractions and ( 4 L2 + 4 L ) multiplications/divisions.
84
2. Rate of Convergence
RLS provides a faster convergence speed than the LMS because
RLS is an approximation of the Newton method while LMS is an
approximation of the steepest descent method.
the pre-multiplication of R (n) 1 in the RLS algorithm makes the resultant
eigenvalue spread becomes unity.
Improvement of LMS algorithm with the use of Orthogonal Transform
A. Motivation
When the input signal is white, the eigenvalue spread has a minimum
value of 1. In this case, the LMS algorithm can provide optimum rate of
convergence.
However, many practical signals are nonwhite, how can we improve the
rate of convergence using the LMS algorithm?
85
B. Idea
To transform the input x(n) to another signal v(n) so that the modified
eigenvalue spread is 1. Two steps are involved:
1. Transform x(n) to v(n) using an N N orthogonal transform T so that
12 0 L 0
0 22 M
R vv = T R xx T T = M 0 O
2N 1 0
0 L 0 2N
where
V ( n) = T X ( n)
86
{
R xx = E X (n) X (n)T }
{
R vv = E V (n)V (n)T }
2. Modify the eigenvalues of R vv so that the resultant matrix has identical
eigenvalues:
2 0 L 0
power normalizatoin 0 2 M
R vv R ' vv = M O
2 0
0 L 0 2
87
Block diagram of the transform domain adaptive filter
88
C. Algorithm
The modified LMS algorithm is given by
W (n + 1) = W (n) + 2e( n) 2 V ( n)
where
1 / 12 0 L 0
2
0 1 / 2 M
2 = M O
2
1 / N 1 0
0 L 0 1 / 2N
e( n ) = d ( n ) y ( n )
y ( n) = W ( n) T V ( n) = W ( n) T T X ( n)
89
Writing in scalar form, we have
2e(n)vi (n)
wi (n + 1) = wi (n) + , i = 1,2, L , N
i2
Since i2 is the power of vi (n) and it is not known a priori and should be
estimated. A common estimation procedure for E{vi2 (n)} is
i2 (n) = i2 (n 1) + | vi (n) | 2
where
0 < <1
90
Using a 2-coefficient adaptive filter as an example:
91
Error surface with discrete cosine transform (DCT)
92
Error surface with transform and power normalization
93
Remarks:
1. The lengths of the principle axes of the hyperellipses are proportional
to the eigenvalues of R .
2. Without power normalization, no convergence rate improvement of
using transform can be achieved.
3. The best choice for T should be Karhunen-Loeve (KL) transform which
is signal dependent. This transform can make R vv to a diagonal matrix
but the signal statistics are required for its computation.
4. Considerations in choosing a transform:
fast algorithm exists?
complex or real transform?
elements of the transform are all power of 2?
5. Examples of orthogonal transforms are discrete sine transform (DST),
discrete Fourier transform (DFT), discrete cosine transform (DCT),
Walsh-Hadamard transform (WHT), discrete Hartley transform (DHT)
and power-of-2 (PO2) transform.
94
Improvement of LMS algorithm using Newton's method
R xx (l , n) = R xx (l , n 1) + x( n + l ) x(n), l = 0,1, L , L 1
95
Possible Research Directions for Adaptive Signal Processing
1. Adaptive modeling of non-linear systems
For example, second-order Volterra system is a simple non-linear system.
The output y (n) is related to the input x(n) by
L 1 L 1 L 1
(1) ( 2)
y ( n) = w ( j ) x ( n j ) + w ( j1 , j 2 ) x(n j1 ) x(n j 2 )
j =0 j1 = 0 j 2 = 0
96
Some remarks:
When p =1, it becomes least-mean-deviation (LMD), when p =2, it is
least-mean-square (LMS) and if p =4, it becomes the least-mean-fourth
(LMF).
The LMS is optimum for Gaussian noises and it may not be true for
noises of other probability density functions (PDFs). For example, if the
noise is impulsive such as a -stable process with 1 < 2 , LMD
performs better than LMS; if the noise is of uniform distribution or if it is a
sinusoidal signal, then LMF outperforms LMS. Therefore, the optimum p
depends on the signal/noise models.
The parameter p can be any real number but it will be difficult to
analyze, particularly for non-integer p .
Combination of different norms can be used to achieve better
performance.
Some suggests mixed norm criterion, e.g. a E{e 2 (n)} + b E{e 4 (n)}
97
Median operation can be employed in the LMP algorithm for operating in
the presence of impulsive noise. For example, the median LMS belongs
to the family of order-statistics-least-mean-square (OSLMS) adaptive
filter algorithms.
3. Adaptive algorithms with fast convergence rate and small
computation
For example, design of optimal step size in LMS algorithms
4. Adaptive IIR filters
Adaptive IIR filters have 2 advantages over adaptive FIR filters:
It generalizes FIR filter and it can model IIR system more accurately
Less filter coefficients are generally required
However, development of adaptive IIR filters are generally more difficult
than the FIR filters because
The performance surface is multimodal the algorithm may lock at an
undesired local minimum
It may lead to biased solution
It can be unstable
98
5. Unsupervised adaptive signal processing (blind signal processing)
What we have discussed previously refers to supervised adaptive signal
processing where there is always a desired signal or reference signal or
training signal.
In some applications, such signals are not available. Two important
application areas of unsupervised adaptive signal processing are:
Blind source separation
e.g. speaker identification in the noisy environment of a cocktail party
e.g. separation of signals overlapped in time and frequency in wireless
communications
Blind deconvolution (= inverse of convolution)
e.g. restoration of a source signal after propagating through an
unknown wireless channel
6. New applications
For example, echo cancellation for hand-free telephone systems and
signal estimation in wireless channels using space-time processing.
99
Questions for Discussion
1. The LMS algorithm is given by (4.23):
where
e( n ) = d ( n ) y ( n )
L 1
y (n) = wi (n) x(n i ) = W (n)T X (n) ,
i =0
Based on the idea of LMS algorithm, derive the adaptive algorithm that
minimizes E{ e(n) }.
|v|
(Hint: = sgn(v) where sgn(v) = 1 if v > 1 and sgn(v) = 1 otherwise)
v
100
2. For adaptive IIR filtering, there are basically two approaches, namely,
output-error and equation-error. Let the unknown IIR system be
N 1
j
bjz
B( z ) j =0
H ( z) = =
A( z ) M 1
1 + a i z i
i =1
e( n ) = d ( n ) y ( n )
with
101
N 1
j
b j z
Y ( z ) B ( z ) j =0
= =
X ( z ) A ( z ) M 1
1 + a i z i
i =1
N 1 M 1
y (n) = b j x(n j ) a i y (n i )
j =0 i =1
On the other hand, the equation-error approach is always stable and has
a unimodal surface. Its system block diagram is shown in the next page.
102
n(k)
+
Unknown system d(k) +
s(k)
H(z)
r(k)
+
B(z) A(z)
e(k)
103
Chapter 5
Estimation Theory and Applications
References:
1
Estimation Theory and Applications
Application Areas
1. Radar
2
3
2. Mobile Communications
The position of the mobile terminal can be estimated using the time-of-
arrival measurements received at the base stations.
4
3. Speech Processing
Recognition of human speech by a machine is a difficult task because
our voice changes from time to time.
Given a human voice, the estimation problem is to determine the
speech as close as possible.
4. Image Processing
Estimation of the position and orientation of an object from a camera
image is useful when using a robot to pick it up, e.g., bomb-disposal
5. Biomedical Engineering
Estimation the heart rate of a fetus and the difficulty is that the
measurements are corrupted by the mothers heart beat as well.
6. Seismology
Estimation of the underground distance of an oil deposit based on
sound reflection due to the different densities of oil and rock layers.
5
Differences from Detection
1. Radar
6
7
2. Communications
8
9
3. Speech Processing
Given a human speech signal, the detection problem is decide what is the
spoken word from a set of predefined words, e.g., 0, 1,, 9
Waveform of 0
10
4. Image Processing
Fingerprint authentication: given a fingerprint image and his owner says
he is A, we need to verify if it is true or not
11
5. Biomedical Engineering
12
What is Estimation?
Extract or estimate some parameters from the observed signals, e.g.,
13
Estimate the value of resistance R from a set of voltage and current
readings:
V [n] = Vactual [n] + w1[n], I [n] = I actual [n] + w2 [n], n = 0,1, L , N 1
Given N pairs of ( V [n], I [n] ), we need to estimate the resistance R ,
ideally, R = V / I
the parameter is not directly observed in the received signals
( x s xn ) 2 ( y s y n ) 2
r[ n ] = + w[n], n = 0,1, L , N 1
c
Given r[n] , we need to find the mobile position ( x s , y s ) where c is the
signal propagation speed and ( x n , y n ) represent the known position of
the n th base station
the parameters are not directly observed in the received signals
14
Types of Parameter Estimation
Linear or non-linear
Linear: DC value, amplitude of the sine wave
Non-linear: Frequency of the sine wave, mobile position
Constrained or unconstrained
Constrained: Use other available information & knowledge, e.g., from
the N pairs of (V [n], I [n] ), we draw a line which best fits
the data points and the estimate of the resistance is
given by the slope of the line. We can add a constraint
that the line should cross the origin (0,0)
Unconstrained: No further information & knowledge is available
15
Parameter is unknown deterministic or random
16
Performance Measures for Classical Parameter Estimation
Accuracy:
Is the estimator biased or unbiased?
Proposed estimators:
A1 = x[0]
1 N 1
A 2 = x[n]
N n =0
1 N 1
A 3 = x[n]
N 1 n =0
N 1
A 4 = N x[n] = N x[0] x[1]L x[ N 1]
n=0
17
Biased : E{ A } A
Unbiased : E{ A } = A
Asymptotically unbiased : E{ A } = A only if N
1 N 1 1 N 1 1 N 1
E{ A 2 } = E x[n] = E A + E w[n]
N n =0 N n =0 N n =0
1 N 1 1 N 1 1 1 N 1
= A+ E{w[n]} = N A + 0= A
N n =0 N n =0 N N n =0
N 1
E{ A 3 } = A= A
N 1 1 1/ N
18
For A4 , it is difficult to analyze the biasedness. However, for w[n] = 0 :
N
N x[0] x[1]L x[ N 1] = N A AL A = A N = A
19
In general,
1 N 1
2
1 N 1 2w
2 2
E{( A 2 A) } = E x[n] A = E w [n] =
N n = 0 N n = 0 N
1 N 1
2
A
2
2
E{( A 3 A) 2 } = E x[n] A = +
w
N 1 n = 0 N 1 N 1
20
An optimum estimator should give estimates which are
Unbiased
Minimum variance (MSE as well)
Require knowledge of the noise PDF and the PDF must have closed form
21
Let the parameters to be estimated be = [1 , 2 , L , P ]T , the CRLB for
i in Gaussian noise is stated as follows
[ ]
CRLB( i ) = [J ()]i,i = I 1 () i ,i (5.4)
where
2 ln p (x; ) 2 ln p ( x; ) 2 ln p (x; )
- E
2 - E L - E
1 1 2 1 P
2
ln p (x; ) ln p ( x; )
2
- E - E
I () =
2
2
(5.5)
2 1
M O
2
- E ln p (x; ) 2
ln p (x; )
- E
2
P
P 1
22
p (x; ) represents PDF of x = [ x[0], x[1], L , x[ N 1]]T and it is
parameterized by the unknown parameter vector
Note that
1 2
e.g., J = [J ] 2 ,2 = 3
2 3
2 2
ln p ( x; ) ln p ( x; )
E = E
i j j i
23
Review of Gaussian (Normal) Distribution
1 1
p( x) = exp ( x ) 2 (5.6)
2 2
2 2
We can write x ~ N (, )
1 1
p ( x) = exp (x )T C -1 (x ) (5.7)
(2) N / 2 det1 / 2 (C) 2
We can write x ~ N (, C)
24
The covariance matrix C has the form of
C = E{( x ) (x )T }
where
x = [ x[0], x[1], L , x[ N 1]]T
= E{x} = [ 0 , 1 , L , N 1 ]]T
25
If x is a zero-mean white vector and all vector elements have variance 2
2 0 L 0
2
0 M
C = E{( x ) (x )T } = = 2 I N
M O 0
2
0 L 0
1 1 N 1 2
p ( x) = exp x [ n] (5.9)
2 N /2 2
(2 ) 2 n = 0
26
Example 5.1
x[0] = A + w[0]
and
x[n] = A + w[n], n = 0,1, L , N 1
1 1 N 1
p (x;A) = exp 2 ( x[n] A)
2
2 N /2
(2 w ) 2 n = 0
w
27
Example 5.2
x[0] = A + w[0]
1 1 2
p ( x[0]; A) =
exp 2 ( x[0] A)
2
22w w
1
ln( p ( x[0]; A)) = ln( 22w ) 2 ( x[0] A) 2
2 w
ln( p ( x[0]; A)) 1 ( x[0] A)
= 2 2( x[0] A) 1 =
A 2 w 2w
2 ln( p ( x[0]; A)) 1
2
= 2
A w
28
As a result,
2 ln( p ( x[0]; A)) 1
E 2 =
A 2w
1
I ( A) = I ( A) = 2
w
J ( A) = 2w
CRLB( A) = 2w
var( A ) 2w
29
We also observe that a simple unbiased estimator
A1 = x[0]
Example 5.3
1 1 N 1
p (x;A) = exp 2 ( x[n] A)
2
2 N /2
(2 w ) 2 n = 0
w
30
1 1 N 1
2
p (x; A) =
exp 2 ( x[n] A)
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; A)) = ln((2 w ) ) 2 ( x[n] A) 2
2 w n = 0
N 1
N 1
( x[n] A)
ln( p (x; A)) 1
= 2 2 ( x[n] A) 1 = n = 0 2
A 2 w n =0 w
2 ln( p (x; A)) N
2
= 2
A w
31
As a result,
N
I ( A) = I ( A) =
2w
2w
J ( A) =
N
2w
CRLB( A) =
N
2
var( A ) w
N
32
We also observe that a simple unbiased estimator
A1 = x[0]
does not achieve the CRLB
1 N 1
A 2 = x[n]
N n =0
achieve the CRLB
1 N 1
2
1 N 1 2 2w
2
E{( A2 A) } = E x[n] A = E w [n] =
N n = 0 N n = 0 N
33
Example 5.4
34
2 ln( p (x; )) N
2
= 2
A w
2 ln( p(x; )) N
E 2 = 2
A w
N 1 N 1
2 ( x[n] A) ( w[n])
ln( p (x; ))
= n =0 = n =0 4
A 2w 4w w
N 1
2 ln( p(x; )) ( E{w[n]})
E = n =0 =0
2 4
A w w
35
N N 1 N 1
ln( p (x; )) = ln(2) ln( w ) 2 ( x[n] A) 2
2
2 2 2 w n = 0
ln( p (x; )) N 1 N 1
= 2 + 4 ( x[n] A) 2
2w 2 w 2 w n = 0
2 ln( p (x; )) N 1 N 1
= 4 6 ( w[n]) 2
( 2w ) 2 2 w w n = 0
2 ln( p (x; )) N 1 2 N
E 2 2 = 4
6
N w =
( w ) 2 w w 2 4w
N
2 0
I () = w
0 N
2 4w
36
2w
0
J () = I () = N
-1
4
2 w
0 N
2w
CRLB( A) =
N
2 4w
CRLB( 2w ) =
N
the CRLBs for unknown and known noise power are identical
37
Example 5.5
The PDF is
1 1 N 1
p (x; ) = exp 2 ( x[n] A cos(0 n + ))
2
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0
38
ln( p (x; )) 1 N 1
= 2 2( x[n] A cos(0 n + )) A sin(0 n + )
2 w n = 0
A N 1 A
= 2 x[n] sin(0 n + ) sin( 20 n + 2)
w n=0 2
2 ln( p (x; )) A N 1
= 2 [x[n] cos(0 n + ) A cos(20 n + 2)]
2 w n=0
2 ln( p (x; )) A N 1
E 2 = 2 [ A cos(0 n + ) cos(0 n + ) A cos(20 n + 2)]
w n=0
A2 N 1
[
= 2 cos 2 (0 n + ) cos(20 n + 2)
w n=0
]
A2 N 1 1 1
= 2 + cos(20 n + 2) cos(20 n + 2)
w n=0 2 2
39
2 ln( p (x; )) A2 N A2 N 1
E 2 = 2 + 2 cos(20 n + 2)
w 2 2 w n = 0
NA2 A2 N 1
= 2 + 2 cos(20 n + 2)
2 w 2 w n = 0
As a result,
1 1
NA2 A2 N 1 2 2w 1 N 1
CRLB() = 2 2 cos(20 n + 2) = 2
1 cos(20 n + 2)
2 w 2 w n = 0 NA N n = 0
1 N 1
If N >> 1, cos(20 n + 2) 0
N n =0
then
2 2w
CRLB()
NA2
40
Example 5.6
1 1 N 1
2
p (x; ) =
exp 2 ( x[n] A cos(0 n + )) , = [ A,0 , ]
2 N /2
(2 w )
2 w n = 0
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0
ln( p (x; )) 1 N 1
= 2 2 ( x[n] A cos(0 n + )) cos(0 n + )
A 2 w n =0
1 N 1
= 2 ( x[n] cos(0 n + ) A cos 2 (0 n + ))
w n =0
41
2 ln( p (x; )) 1 N 1 2 1 N 1 1 1
= 2 cos (0 n + ) = 2 + cos(20 n + 2)
A2 w n =0 w n =0 2 2
N
2 2w
2 ln( p (x; )) N
E
2 2
A 2 w
Similarly,
2 ln( p ( x; )) A N 1
E = 2 n sin( 20 n + 2) 0
A0 2 w n = 0
2 ln( p ( x; )) A N 1
E = 2 sin( 20 n + 2) 0
A 2 w n = 0
42
2 ln( p (x; )) A 2 N 1 2 1 1 A 2 N 1
E = n cos(20 n + 2) n2
0 2 2w n = 0 2 2 2 2w n = 0
2 ln( p ( x; )) A2 N 1 2 A2 N 1
E = 2 n sin (0 n + ) 2 n
0 w n=0 2 w n = 0
2 ln( p (x; )) A 2 N 1 2 NA 2
E
2 = 2 sin (0 n + ) 2
w n=0 2 w
N
0 0
2
1 A 2 N 1 2 A 2 N 1
I () 0 n n
2w
2 n=0 2 n=0
2
0 A 2 N 1 NA
n
2 n=0 2
43
After matrix inversion, we have
2 2w
CRLB( A)
N
12 A2
CRLB(0 ) , SNR =
2
SNR N ( N 1) 2 2w
2(2 N 1)
CRLB()
SNR N ( N + 1)
Note that
2(2 N 1) 4 1 2 2w
CRLB() > =
SNR N ( N + 1) SNR N SNR N NA
44
Parameter Transformation in CRLB
2
g ()
CRLB() = (5.10)
ln( p (x; ))
2
E
2
45
Example 5.7
= g ( A) = A 2
2
g ( A) g ( A) 2
= 2A = 4A
A A
From Example 5.3, we have
2 ln( p (x; A)) N
E = 2
2 w
A
As a result,
2 2 2
2 2 w 4 A w
CRLB( A ) 4 A = , N >> 1
N N
46
Example 5.8
= g ( A) = c1 + c 2 A
2
g ( A) g ( A) 2
= c2 = c2
A A
As a result,
2
CRLB() = c 22 CRLB( A) = c 22 w
N
c 22 2w
=
N
47
Maximum Likelihood Estimation
Require knowledge of the noise PDF and the PDF must have closed form
48
e.g., given p (x = x 0 ; ) where x 0 is the observed data, as below
49
Example 5.9
Given
x[n] = A + w[n], n = 0,1, L , N 1
1 1 N 1
p (x;A) = exp ( x[ n ] A) 2
(22w ) N / 2 2 2 n = 0
w
Since arg max p (x; ) = arg max{ln( p (x; ))}, taking log for p (x;A) gives
1 N 1
ln( p (x; A)) = ln((22w ) N / 2 ) ( x[ n ] A) 2
2 2w n = 0
50
Differentiate with respect to A yields
N 1
N 1
( x[n] A)
ln( p (x; A)) 1
= 2 2 ( x[n] A) 1 = n = 0 2
A 2 w n=0 w
N 1
( x[n] A ) N 1
n =0 1 N 1
= 0 ( x[n] A) = 0 A = x[n]
2w n =0 N n =0
Note that
ML estimate is identical to the sample mean
Attain CRLB
51
Example 5.10
The PDF is
1 1 N 1
p (x; ) = exp 2 ( x[n] A cos(0 n + ))
2
2 N /2
(2 w ) 2 n = 0
w
2 N /2 1 N 1
ln( p (x; )) = ln((2 w ) ) 2 ( x[n] A cos(0 n + )) 2
2 w n = 0
52
It is obvious that the maximum of p (x; ) or ln( p (x; )) corresponds to the
minimum of
1 N 1 2
N 1
2
2
( x[ n ] A cos( 0 n + )) or ( x[ n ] A cos( 0 n + ))
2 w n = 0 n=0
53
Approximate ML (AML) solution may exist and it depends on the structure
of the ML expression. For example, there exists an AML solution for
N 1 A N 1
x[n] sin(0 n + ) = sin( 20 n + 2 )
n=0 2 n =0
1 N 1 A 1 N 1
A
x[n] sin(0 n + ) = sin( 20 n + 2) 0 = 0, N >> 1
N n=0 2 N n=0 2
54
N 1 x[ n] sin( n)
= tan 1 Nn =01 0
x[ n ] cos( n )
n =0 0
55
For parameter transformation, if there is a one-to-one relationship
between = g () and , the ML estimate for is simply:
= g ( ) (5.12)
Example 5.11
P = 10 log10 ( 2 )
56
2 1 1 N 1 2
p(w; ) = exp x [ n]
2 N /2 2
(2 ) 2 n = 0
2 N N 2 1 N 1 2
ln( p (w; )) = ln(2) ln( ) x [ n]
2 2 2
2 n = 0
Differentiating the log-likelihood function w.r.t. to 2 :
ln( p (w; 2 )) N 1 N 1 2
= 2 + 4 x [ n]
2 2 2 n = 0
Setting the resultant expression to zero:
N 1 N 1 2 2 1 N 1 2
2
= 4 x [n] = x [ n]
2 2 n = 0 N n=0
As a result,
1 N 1
2 2
P = 10 log10 ( ) = 10 log10 x [ n]
N n =0
57
Example 5.12
Given
x[n] = A + w[n], n = 0,1, L , N 1
1 1 N 1 2 2 T
p (x;) = 2 N /2
exp 2 ( x[n] A) , = [A ]
(2 ) 2 n = 0
ln( p (x;)) 1 N 1
= 2 ( x[n] A)
A n =0
ln( p (x;)) N 1 N 1 2
= + ( x[ n ] A)
2 2 2 2 4 n = 0
58
Solving the first equation:
N 1
A = 1 x[n] = x
N n =0
2 1 N 1 2
= ( x[n] x )
N n=0
Grid search
59
Example 5.13
N 1 A N 1
x[n] sin(0 n + ) = sin( 20 n + 2 )
n=0 2 n =0
60
The idea of grid search is simple:
Search for all possible values of or a given range of to find root
Values are discrete tradeoff between resolution & computation
e.g., Range for : any values in [0,2)
Discrete points : 1000 resolution is 2 / 1000
MATLAB source code:
N=100;
n=[0:N-1];
w = 0.2*pi;
A = sqrt(2);
p = 0.3*pi;
np = 0.1;
q = sqrt(np).*randn(1,N);
x = A.*cos(w.*n+p)+q;
for j=1:1000
pe = j/1000*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end
61
pe = [1:1000]/1000;
plot(pe,g)
Note: x-axis is /( 2)
62
stem(pe,g)
axis([0.14 0.16 -2 2])
63
For a smaller resolution, say 200 discrete points:
clear pe;
clear s1;
clear s2;
clear g;
for j=1:200
pe = j/200*2*pi;
s1 =sin(w.*n+pe);
s2 =sin(2.*w.*n+2.*pe);
g(j) = x*s1'-A/2*sum(s2);
end
pe = [1:200]/200;
plot(pe,g)
64
stem(pe,g)
axis([0.14 0.16 -2 2])
65
Approach 2: Newton/Raphson iterative procedure
With initial guess 0 , the root of g () can be determined from
g ( k )
g ( k )
k +1 = k = (5.13)
dg () g ' ( k )
d =
k
N 1 A N 1
g () = x[n] sin(0 n + ) sin( 20 n + 2)
n =0 2 n=0
N 1 A N 1
g ' () = x[n] cos(0 n + ) cos(20 n + 2) 2
n=0 2 n =0
N 1 N 1
= x[n] cos(0 n + ) A cos(20 n + 2)
n=0 n =0
with
0 = 0
66
p1 = 0;
for k=1:10
s1 =sin(w.*n+p1);
s2 =sin(2.*w.*n+2.*p1);
c1 =cos(w.*n+p1);
c2 =cos(2.*w.*n+2.*p1);
g = x*s1'-A/2*sum(s2);
g1 = x*c1'-A*sum(c2);
p1 = p1 - g/g1;
p1_vector(k) = p1;
end
stem(p1_vector/(2*pi))
67
ML Estimation for General Linear Model
x = H + w (5.14)
where
1 1 T -1
p (x; ) = N /2 1/ 2
exp ( x H ) C ( x H ) (5.15)
( 2 ) det (C) 2
68
Since C is not a function of , the ML solution is equivalent to
69
Example 5.14
Given N pair of ( x, y ) where x is error-free but y is subject to error:
70
Writing in matrix form:
y = H + w
where
x[0] 1
x[1] 1
H=
M M
x[ N 1] 1
= = (HT C 1H ) 1 HT C 1y
m
c
71
Example 5.15
1 1 N 1 2
p (x; ) =
exp ( x[n] A cos(0 n + )) , = [ A,0 , ]
2 N /2
(2 w ) 2
2 w n = 0
N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0
72
This can be achieved by using a 3-D grid search or Netwon/Raphson
method but it is computationally complex
Another simpler solution is as follows
N 1
J ( A, 0 , ) = ( x[n] A cos(0 n + )) 2
n=0
N 1
= ( x[n] A cos() cos(0 n) + A sin() sin(0 n)) 2
n=0
A = 12 + 22
1 = A cos()
2 = A sin() 1 2
= tan
1
73
Let
c = [1 cos(0 ) L cos(0 ( N 1))]T
s = [0 sin(0 ) L sin(0 ( N 1))]T
We have
1
T
= (x - H ) (x - H ), = , H = [c s]
2
Applying (5.17) gives
= (HT H ) 1 HT x
74
Substituting back to J (1, 2 , 0 ) :
J (0 ) = (x - H )T (x - H )
= (x - H (HT H ) 1 HT x)T (x - H (HT H ) 1 HT x)
( T 1
= (I - H ( H H ) H ) x T
) ((I - H (HT H)1 HT ) x)
T
= xT (I - H (HT H ) 1 HT )T (I - H (HT H ) 1 HT ) x
= xT (I - H (HT H ) 1 HT ) x
= xT x - xT H (HT H ) 1 HT x
Minimizing J (0 ) is identical to maximizing
xT H (HT H ) 1 HT x
or
{
0 = arg max xT H (HT H ) 1 HT x
0
}
3-D search is reduced to a 1-D search
75
After determining 0 , can be obtained as well
1 N 1 2
0 = arg max x[n] exp( j0 n) periodogram maximum
0 N n = 0
76
Least Squares Methods
77
Variants of LS Methods
1. Standard LS
Consider the general linear data model:
x = H + w
where
x is the observed vector of size N
w is zero-mean noise vector with unknown covariance matrix
H is known matrix of size N p
is parameter vector of size p
{ }
= arg min (x - H )T (x - H ) = (HT H ) 1 HT x
(5.18)
78
LS solution is optimum if covariance matrix of w is C = 2w I and w is
Gaussian distributed
Define
e = x - H
where
(5.18) is equivalent to
N 1
= arg min e 2 (k ) (5.19)
k =0
79
Example 5.16
Given
x[n] = A + w[n], n = 0,1, L , N 1
80
On the other hand, writing {x[n]} in matrix form:
x = HA + w
where
1
1
H=
M
1
Using (5.18),
1
1 x[0]
1 x[1]
N 1
1
A = [1 1 L 1] [1 1 L 1] = N x[n]
M M n =0
1 x[ N 1]
Both (5.18) and (5.19) give the same answer and the LS solution is
81
optimum if the noise is white Gaussian
Example 5.17
Consider the LS filtering problem again. Given
82
where
X T (0) x[0] 0 L 0
T
X (1) x[1] x[ 0 ] L 0
H= =
M M M M M
T
X ( N 1) x[ N 1] x[ N 2 ] L x[ N L ]
Note that
R xx = HT H
R dx = HT d
where R xx is not the original version but not modified version of (3.6)
83
Example 5.18
Find the LS estimate of A for
Using (5.19),
N 1
A = arg min ( x[n] A cos(0 n + ) )2
A n=0
N 1
( x[n] A cos(0 n + ) ) with respect to A & set result to 0:
2
Differentiate
n=0
N 1
2 ( x[n] A cos(0 n + ) ) cos(0 n + ) = 0
n=0
N 1 N 1
x[n] cos(0 n + ) = A cos 2 (0 n + )
n=0 n =0
84
The LS solution is then
N 1
x[n] cos(0 n + )
A = n =N01
2
cos (0 n + )
n=0
2. Weighted LS
{ }
= arg min (x - H )T W(x - H ) = (HT WH ) 1 HT Wx
(5.20)
such that
W = WT
85
Rationale of using W : put larger weights on data with smaller errors
Example 5.19
Given two noisy measurements of A :
x1 = A + w1 and x2 = A + w2
where w1 and w2 are zero-mean uncorrelated noises with known
variances 12 and 22 . Determine the optimum weighted LS solution
86
Use
1
-1 1 0 1 / 12 0
2
W=C = 2
= 2
0 2 0 1 / 2
x1 1 w1
x = 1 A + w
2 2
or
x = H A+w
Using (5.21)
1
1 / 2
0 1 1 / 12 0 x1
A = (H C H ) H C x = [1 1]
T -1 1 T -1 1
[1 1] 2 x
2 1
1 / 2 2
0 1 / 2 0
87
As a result,
1
x1 x2 2
2
A = 1 + 1 + = 2 x + 1 x
2 2 2 2 2 + 2 1 2 2 2
1 + 2
1 2 1 2 1 2
Note that
Define = 12 / 22 , we have
1
A = x1 + x2
1+ 1+
88
3. Nonlinear LS
The LS cost function cannot be represented as a linear model as in
x = H + w
N 1
2
( x[n] A cos(0 n + ))
n=0
Grid search and numerical methods are used to find the minimum
89
4. Constrained LS
The linear LS cost function is minimized subject to constraints:
{ }
= arg min (x - H )T (x - H ) subject to S (5.22)
90
Consider the constraints S is
A = b
{
= arg min (x - H )T (x - H ) } subject to A = b (5.23)
J c = (x - H )T (x - H ) + T ( A - b) (5.24)
91
Expanding (5.24):
J c = xT x - 2T HT x + T HT H + T A - T b
J c
= -2HT x + 2HT H + AT
- 2HT x + 2HT H c + AT = 0
T 1 T 1 T 1 T 1 T 1 T
c = (H H ) H x - (H H ) A = - (H H ) A
2 2
where is the LS solution. Put c into A = b :
1
A c = A - A(HT H ) 1 AT = b = ( A(HT H ) 1 AT ) 1 ( A - b)
2 2
92
Put back to c :
c = - (HT H ) 1 AT ( A(HT H ) 1 AT ) 1 ( A - b)
Idea of constrained LS can be illustrated by finding minimum value of y :
y = x 2 3x + 2 subject to x y = 1
93
5. Total LS
Motivation: Noises at both x and H :
x + w 1 = H + w 2 (5.25)
x(k ) = s (k ) + ni (k ), n = 0,1, L , N 1
r (k ) = s (k ) h(k ) + no (k ), n = 0,1, L , N 1
94
Another example is in frequency estimation using linear prediction:
For a single sinusoid s (k ) = A cos(k + ) , it is true that
s (k ) = 2 cos() s (k 1) s (k 2)
s (k ) = a0 s (k 1) + a1s (k 2)
95
x(k ) = a0 x(k 1) + a1x(k 2), n = 0,1, L , N 1
x(2) = a0 x(1) + a1x(0) x(2) x(1) x ( 0)
x(3) = a0 x(2) + a1x(1) x(3) x(2) x(1) a0
=
L L L M M M a1
x( N 1) = a0 x( N 2) + a1x( N 3) x( N 1) x( N 2) x( N 2)
6. Mixed LS
A combination of LS, weighted LS, nonlinear LS, constrained LS and/or
total LS
96
Questions for Discussion
1. Suppose you have N pairs of ( xi , yi ) , i = 1,2, L , N and you need to fit
them into the model of y = ax . Assuming that only { yi } contain zero-
mean noise, determine the least squares estimate for a .
yi = axi + ni , i = 1,2, L , N
97
2. Use least squares to estimate the line y = ax in Q.1 but now only {xi }
contain zero-mean noise.
r (n) = s (n 0 ) + w(n)
where the range R of an object is related to the time delay by
0 = 2R / c
98
99