00 голосов за00 голосов против

8 просмотров92 стр.nice

Jan 24, 2019

© © All Rights Reserved

Вы находитесь на странице: 1из 92

&

ESTIMATION

THEORY

2 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Introduction

1.1 Lecture 0

The Detection and Estimation theory is a branches of the statistical signal processing that deal

with the decision making and the extraction of relevant information from noisy data. In many

electronic signal processing systems are designed to decide when an event of interest occurs

and then extract more information about that event. Detection and Estimation theory can be

found at the core of those systems.

Some typical applications involving the use of the detection and the estimation theory

principles include:

Biomedicine: where the presence of cardiac arrhythmia is to be detected from

electrocardiogram or the heart rate of a fetus has to be estimated from sonography during

pregnancy in presence of sensor and environmental noises.

Control Systems: where the position of a powerboat for the corrective navigation system has

to be estimated in the presence of sensor and environmental noise or the occurrence of an

abrupt change in the system is to be detected.

Communication Systems: where the transmitted signals have to be identified at the receiver

or the carrier frequency of a signal is to be estimated for demodulation of the baseband signal

in the presence of degradation noise.

Image Processing: where an object has to be identified or its position and orientation from a

camera image has to be estimated in the presence of lighting and background noises.

Radar Systems: where the occurrence of an air-bourne target (e.g., an aircraft, a missile) is

to be detected or the delay of the received pulse echo is to be estimated to determine the

location of the target in the presence of noises.

Siesmology: where the presence of underground oil is to be detected or the distance of oil

deposit has to be estimated from noisy sound reflection due to different densities of oil and

rock layers.

Sonar Systems: where the presence of a submarine is to be detected or the delay of the

received signal from each of sensors is to be estimated to locate it in the presence of noises

and attenuations.

Speech Processing: where the presence of different events (such as phonemes or words) is to

be detected in speech signal in context of speech recognition application or the parameters of

the speech production model have to be estimated in-context of speech coding application in

the presence of speech/ speaker variability and the environmental noises.Apart from these, a

number of applications stemming from the analysis of data from physical phenomena,

economics, etc., could also be mentioned.

The majority of applications require either detection one or more events of interest and/or the

estimation of an unknown parameter from a collection of observation data which also

includes “artifacts” due to sensor inaccuracies, additive noise, signal distortion (convolution

noise), model inaccuracies, unaccounted source of variability and multiple interfering signals.

These artifacts make the dectection and estimation a challenging problems.

1.1.1 Formulation of the Estimation Problem

The model used in estimation problem involves four components.

The first is the source, whose output depends on a parameter θ that can be regarded as a point

in parameter space. The parameter is either random or nonrandom (deterministic), unkown

quantity.

The second component is the probabilistic mapping that governs the effect of the parameter

on the observations x. This probability mapping is expressed in terms of the joint probability

density function (PDF), denoted by p(x; θ).

The third component is the observation space which is usually multidimensional

measurements belonging to either the continuous or the discrete domain.

The fourth and the last components is the estimation rule g(x) that determines the mapping of

the observation space into an estimate of the unknown parameter.

The whole process is illustrated in Figure 1.1.

Classification of Estimation Approaches

Based on the assumptions made about the unknown parameter, the estimation methods can be

classified into two broad groups: classical parameter estimation and Bayesian estimation. In

the classical parameter estimation methods, no probabilistic assumption about the unknown

parameter is made; rather it is treated as the deterministic unknown. In both these broad

groups, a number of optimal and suboptimal estimation approaches exist which are

necessitated by the lack of the complete knowledge about the mathematical model available

for estimating the unknown quantity. A broad classification of different estimation methods

are given Table 1.1.

Unknown Probabilistic Other Estimator kind (salient Practical

param. assumption requirement property) utility

Sufficeient Minimum variance unbiased Low

statistics (optimal)

Completely

known PDF Large data; no Maximum likelihood Very

Non- statistics (asymptotically optimal) high

random First-two Best linear unbiased Moderate

moments only (suboptimal in general)

Known signal Least squares (suboptimal) High

model; no PDF

Conjugate prior; Minimum mean square error High

quadratic cost (optimal)

Known joint and

prior PDFs Hit-or-Miss cost Maximum a posteriori High

Random Uniform prior; Bayesian maximum Low

Hit-or-Miss cost likelihood

First-two Linear minimum mean square Very

moments only error; Wiener filter high

4 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Unknown Probabilistic Other Estimator kind (salient Practical

param. assumption requirement property) utility

(suboptimal in general)

Table 1.1: Classification of different estimation approaches.

The simplest detection problem is formed when we wish to decide whether a signal is present

embedded in noise or if only noise is present. For example, consider the detection of an

aircraft based on radar echo. This problem can be termed as binary hypothesis

testing problem, the two hypothesis being (1) the aircraft is absent, (2) the aircraft is present.

A somewhat more general binary hypothesis testing problem is encountered in

communications. There our interest is in deciding which of two possible signals is

transmitted. For example our hypothesis in this case consists of a sinusoid of phase

0° embedded in noise verse a sinusoid with phase 180° embedded in noise. Frequently, we

also wish to decide among more than two hypotheses. For example in speech recognition, our

goal is to determine which of the digit among ten possible ones is spoken. Such problem is

referred to as multiple hypothesis testing problem.

All these problems are characterized by the need to decide among two or more possible

hypotheses based on the observed data set. As always, the data are inherently random in

nature due to the inherent variability and noise, so that a statistical approach is necessitated.

We model the detection problem in a form that allows us to apply the theory of statistical

hypothesis testing.

Example

Consider the problem of detection of a DC level of amplitude A = 1 embedded in white

Gaussian noise w[n] with variance ζ² as shown in Figure 1.2. Assume only one sample x[0]

of N-point data x = [x[0] x[1] … x[N - 1]]T is available to make the decision. More formally,

we model the detection problem as one of choosing hypothesis , which is noise-only

hypothesis, and , which is the signal-present hypothesis, or symbolically:

The PDFs under each hypothesis are denoted by and , which for this

example are:

Note that in deciding between and , we are essentially asking whether x[0] has been

generated according to PDF or . Alternatively, if we consider the

family of PDFs

which is parameterized by A, then we can reformulate the detection problem as a parameter

test or symbolically:

of and . For example, in an on-off keyed (OOK) communication system we transmit

a „0‟ by sending no pulse and „1‟ by sending a pulse with amplitude A = 1. Thus it also

corresponds to above mentioned hypothesis or parameter test. Since the likelihoods of

occurrence of data bits, 0 and 1 are equal in long run so it makes sense to regards hypotheses

as random events with probability 1∕2. When we do so; our notation for PDFs will

be and , in keeping with standard notation of a conditional PDF. For

this example, we have then that:

This distinction is analogous the classical versus Bayesian approach to parameter estimation

highlighted earlier.

Hierarchy of Detection Problems

The detection problem in its simplest form assumes that both signal and noise characteristics

are completely known. If the characteristics of signal and/or noise are unknown or not known

completely it leads to detection problem becoming more challenging as well as complex. The

hierarchy of the detection problems along with their typical applications are listed in

Table 1.2.

Conditions Applications

Level 1: Known signals in noise 1. Synchronous digital communication

2. Pattern recognition

Level 2: Signals with unknown 1. Digital communication system without phase reference

parameters in noise 2. Digital communication over slowly fading channels

3. Conventional pulse radar and sonar, target detection

2. Passive sonar

3. Radio astronomy (detection of noise sources)

1.1.3 Organization of the Material

The signal detection and parameter estimation problems are closely linked and often we are

required to address both problems in the same system. But they both have their separate

applications. As the detection theory employes many concepts and techniques developed for

the estimation theory, we first present the estimation theory concepts and then the detection

theory concepts. Further most of the real world problems involve analog observations; so

traditionally the detection and the estimation theories were developed for the continuous-time

domain. But nowadays as most of the signal processing is done on digital computers, the

detection and the estimation theories are analogously developed for the discrete-time domain.

In this material, only the discrete-time cases are considered.

In this material, we begin with classical estimation methods which include: minimum

variance unbiased estimator (MVUE), best linear unbiased estimator (BLUE), maximum

likelihood estimator (MLE) and least squares estimator (LSE). These estimators are discussed

in Modules 2-5. It is followed by the Bayesian estimation methods. These techniques are

discussed in Modules 6-7. In detection theory, we first describe different types of dection

criteria in Modele 8. It is followed by brief introduction to non-parametric detection methods

in Module 9. Finally we describe a detection of deterministic and random signals in white

Gaussian noise in Modules 10-11. At the end of this material some long answer and multiple-

choice type questions are given in Module 12.

Outline

8.1 Outline

A detection problem can be classified into two broad classes: parametric detection and non-

parametric detection. In non-parametric detection, the probability density function (PDF) of

the data is unknown and that will be discussed in Module 9. In this module, the parametric

detection approaches to hypothesis testing are presented. In these approaches, the complete

knowledge about the PDF or that of its structure is assumed to be available. We begin with

the discussion of the Neyman-Pearson (NP) detector and its generalization as the mean

shifted Gauss-Gauss detection problems. An alternative approach to hypothesis testing, the

Bayesian detector which allows the use of the prior information, is then introduced along

with its variant as the Minimax detector. The detection under more complex cases where the

complete knowledge about the PDF is not available is then discussed using both classical and

Bayesian approaches. The salient topics discussed this module are:

o Neyman-Pearson criterion

o Bayes criterion

o Minimax criterion

Composite hypothesis testing

o Bayesian criterion

o Generalized likelihood ratio tests

Lecture 24 : Hypothesis Testing

8.2.1 Simple Hypothesis Testing

In detection theory, a hypothesis is a statement about the source of the observed data. In the

simplest case, we have the null hypothesis ( 0) that there is no change from the usual and

the alternate hypothesis ( 1) that there is a change. For example. in the target detection

problem, the hypotheses may be

and the objective is to decide which one of these hypothesis is true based on the observed

data.

We begin with those decision making problems in which the PDF for each assumed

hypothesis is completely known, that is why they are referred to as simple hypothesis testing

problems. The primary approaches to simple hypothesis testing are the classical approach

based on Neyman-Pearson theorem and the Bayesian approach based on minimization of the

Bayes risk. In many ways these approaches are analogous to the classical and Bayesian

methods of statistical estimation theory.

8.2.2 Neyman-Pearson (NP) Detector

Before explaining the NP detector, we first describe some relevant terminologies. Suppose

we observe realization of a random variable whose PDF is either (0, 1) and (1, 1). The

detection problem cam be summarized as:

where 0 is referred to as the null hypothesis and 1 as the alternate hypothesis.

The PDFs under each hypothesis along with the probabilities of hypothesis testing errors as

shown in Figure 8.1. A reasonable approach might be to decide 1 if x[0] > 1∕2. This is

because if x[0] > 1∕2 the observed sample more likely if 1 is true. Our detector then

compares the observed datum value with 1∕2, which is called threshold.

With this scheme we can make two errors. If we decide 1 but 0 is true, we make Type I

error. On the other hand, if we decide 0 when 1 is true, we make a Type II error. The

terms Type I and Type II errors are use in statistical domain but in engineering domain these

errors are referred to as false alarm and miss, respectively. The term P( i; j) indicates the

probability of deciding i when j is true. Note that it is not possible to reduce both error

probabilities simultaneously. A typical approach is to hold one error probability value fixed

while minimizing the other.

Figure 8.1: PFDs in binary hypothesis testing, possible errors and their probabilities

In general terms the goal of a detector is to decide either 0 or 1 based on observed data

x = [x[0] x[1]… x[N - 1]] . This is a mapping from each possible data set value into a

T

decision. The decision regions for the previous example are shown in Figure 8.2. Let R1 be

the set of values in RN that map into decision 1. Then probability of false alarm PFA (i.e.,

where α is termed as significance level or size of the test in statistics. Now there are many

R1that satisfy the above relation. Our goal is to choose the one that maximizes the probability

of detection defined as:

Figure 8.2: Decision regions in binary hypothesis testing

Neyman-Pearson Theorem: To maximize the PD for a given PFA = α decide 1 if:

where the threshold γ is computed from the constraint on the probability of false alarm value:

Then function L(x) is termed the likelihood ratio and the entire test is called as likelihood

ratio test (LRT).

8.2.3 Example

Consider the general signal detection problem:

where signal s[n] = A for A > 0 and w[n] is WGN with variance ζ².

The NP detector decides 1 if:

Taking logarithm of both sides, simplifying and moving non-data dependent terms to right-

hand side, we have:

Thus the NP detector compares the sample mean to a threshold γ′. Note that the test

statistic is Gaussian under each hypothesis:

We have then:

and

9 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

The threshold can be found as:

and therefore:

8.3.1 Mean Shifted Gauss-Gauss Detection Problem

In this class of detection problems, we observe the value of the test statistic T and decide

1 if T > γ′ and 0 otherwise. The PDF of T is assumed to be:

where μ1 > μ0. Thus we decide between two hypotheses that differ by a shift in mean of T. For

this type of detector the detection performance is totally characterized by the deflection

coefficient defined as:

In the case when μ0 = 0, d² = μ1²∕ζ² which may be interpreted as the signal-to-noise ratio

(SNR).

Since,

so we have,

The detection performance is, therefore, monotonic with respect to the deflection coefficient.

8.3.2 Receiver Operating Characteristics

The receiver operating characteristic or simply ROC curve is an graphical means to illustrate

the performance of NP detector. The ROC curve was first used in World War II for the

analysis of radar signals and later it was employed in signal detection theory. In ROC curve,

each point corresponds to a value (PFA, PD) for a given threshold value. By adjusting the value

of the threshold any point of the curve may be obtained. As the threshold increases the

PFA decreases but so does the PD and vice-versa.

We already know that for the DC level in WGN example, we have

and

Figure 8.3: Receiver operating charateristics for DC level detection in WGN for varying

values of the deflection coefficient (d²).

Figure 8.3 shows the ROC curve for this detection problem for different values of the

deflection coefficient d². The ROC should always be above the 45° line as this ROC can be

attained by a detector that bases its decision on flipping a coin, ignorong all the data.

Consider, a detector that decides 1, if head appears in coin toss where Pr{head} = p and

But the probability of occurance of head in the coin toss has no dependence upon which

hypothseis is true and therefore PFA = PD = p. This detector then generates the point (p, p) on

the ROC. Considering the difference values of p it would generate a 45° line on the ROC

curve and the same has been marked as a dotted line.

The family of ROCs generated for different values of the deflection coefficient d² are also

shown in Figure 8.3 . As d² increases, the value of PD obtained for a smaller value of PFA also

increases. For d² → ∞, the ideal ROC is obtained i.e., PD = 1 for any value of PFA.

Further, as the threshold γ varies from -∞ to ∞, the point (PFA(γ), PD(γ)) along the ROC curve.

The salient properties of the ROC curve for binary hypothesis testing are:

1. If threshold γ → -∞, the detector always decides 1 and PFA = PD = 1. Thus point (1, 1) belongs

to the ROC curve.

2. If threshold γ → ∞, the detector never decides 1 and PFA = PD = 0. Thus point (0, 0) belongs to

the ROC curve.

3. The slope of the ROC curve at any point (PFA(γ), PD(γ)) is equal to the threshold γ.

4. All points of the ROC curve satisfy PD ≥ PFA.

5. The ROC curve is concave i.e., the domain of the achievable pairs (PD, PFA) is convex.

6. The region of feasible tests is symmetric about the point (0.5, 0.5) i.e., if (PFA, PD) is feasible, so

is (1 - PFA, 1 - PD).

8.3.3 Example

Consider a random variable Y given by Y = N + λθ, where θ is either 0 or 1, λ is a fixed

number between 0 and 2, and N ~ (-1, 1). We wish to decide between the hypotheses

Find (i) Neyman-Pearson (NP) decision rule to decide 1 for 0 ≤ PFA ≤ 1. (ii) Sketch the

receiver operating characteristics.

Figure 8.4: The distribution of hypotheses and the ROC for NP detector deciding 1.

Given that

The distribution of the two hypotheses are plotted in Fig 8.4 . Let a threshold γ is chosen to

give the specified value of false alarm probability PFA, then

From the above relationship, the ROC for different values of λ is plotted and is also shown in

Fig 8.4.

8.4.1 Bayesian Detector

As argued earlier that in some detection problems one can reasonably assign probabilities to

various hypotheses. This approach, where the prior probabilities are assigned, is the Bayesian

approach to hypothesis testing. In the general Bayesian hypothesis testing, not only prior

probabilities are assigned to each hypothesis but a cost C ij is also assigned to each of the

errors. In some applications, all types of errors are not of equal importance and hence

different costs could be assigned to different errors to optimize the detector performance.

The objective of the detector to minimize the expected cost or Bayes risk defined as:

iwhen j is true and P( i) is prior probability of ith hypothesis.

It can be shown that the detector which minimizes the Bayes risk is to decide 1 if:

Note once again the conditional likelihood ratio is compared to a threshold.

8.4.2 Minimum Probability of Error

In some cases, we do not assign a cost if not error is made in decision, i.e., C00 = C11 = 0 and

also assign equal cost to the errors. Without loss of generality we can assume C01 = C10 = 1,

then the Bayes risk becomes the probability of error Pe which is defined as:

From the previous discussion about the general Bayesian detector, we can easily deduce that

the detector which minimizes the Pe is given by the rule:

This detector which minimizes Pe for any prior probability is termed as the maximum a

posteriori probability (MAP) detector.

In cases where the prior probability of the hypotheses are equal, the detector which

minimizes Pe decides 1 if:

8.4.3 Example

Consider the on-off keying (OOK) communication problem where we transmit either

s0[0] = 0 or s1[n] = A. The detection problem is:

where A > 0 and w[n] is WGN with variance ζ². It is reasonable to assume that

P( 0) = P( 1) = 1∕2. The receiver that minimizes the Pe decides 1 if:

or we decide 1 if > A∕2. This detector is same as obtained with NP criterion except for

the threshold and of course the performance. To determine the Pe we note that:

Thus,

deflection coefficient for the given problem.

Lecture 27 : Minimax Detector

8.5.1 Minimax Detector

The Bayes‟ criterion assigns costs to the decisions and assumes knowledge of the a

prioriprobabilities. In many situations, we may not have enough information about a priori

probabilities so Bayes‟ criterion cannot be used. One approach would be to select a value

of P1, the a priori probability of hypothesis 1, for which the risk is maximum and then

minimize that risk function. This principle of minimizing the maximum average cost for the

selected P1 is referred to as minimax criterion.

The Bayes‟ risk for a binary hypothesis testing problem is given by

Further we can express the probability of different decisions in terms of the probability of

false alarm PFA, the probability of miss PM and the probability of detection PD as,

Let P0 and P1 be a priori probabilities of 0 and 1 respectively. We can express the Bayes‟

risk as

Since either of the hypothesis 0 and 1 will always occurs i.e., P0 = 1 - P1. Using this

Assuming a fixed value of P1 with P1 ∈ (0, 1), Bayes‟ test decides 1 if,

As P1 varies, the decision regions change in turn causing a variation in the average cost which

would be larger than the Bayes‟s cost. The two extremes possible values of P1 are 0 or 1.

When P1 = 0, then the threshold is ∞. We always decide 0, and

If P1 = P1*, such that P1* ∈ (0, 1) then the risk as a function of P1 and is shown in Figure 8.5

As we have already noted that the risk is linear in terms of P1 and Bayes‟ test for P1 = P1*

gives the minimum risk min. The tangent to min is horizontal and *(P1) at P1 = P1*

represents the minimum cost. The Bayes‟ curve must be concave downwards thus the average

cost will not exceed *(P1). Taking the derivative of with respect to P1 and setting to

zero, we obtain the minimax rule to be

If the cost of correct decisions are individually zero (C00 = C11 = 0), then minimax rule

for P1 = P1* reduces to

Furthermore, if the cost of incorrect decisions are individually one (C01 = C10 = 1), then

minimax rule for P1 = P1* reduces to

PM = PFA

and the minimax cost in this case is

8.5.3 Example

Suppose Y is a random variable that has PDF under each hypothesis as

where u(y) is the unit step function. For uniform costs (C00 = C11 = 0 and C01 = C10 = 1), find

the minimax decision rule to decide 1.

Figure 8.6: PDFs of the hypotheses for minimax decision rule example

The PDF under two hypothesis is plotted in Figure 8.6. Let the chosen threshold be y = γ,

then we have

PM = PFA

or

On using the complimentary error function table, the value of γ that satisfies the above

relation turns out to be γ ≈ 0.565.

Thus the minimax decision rule decides 1 if

y > 0.565

Lecture 28 : Multiple Hypothesis Testing

8.6.1 Multiple Hypothesis Testing

In many problem one is required to distinguish between more than two hypotheses. These

problems frequently occurs in pattern recognition or in communication when one

of M signals is to be detected. Although the NP decision rule can be extended for the M-ary

hypothesis test but it is often not used in practice. More commonly the minimum probability

of error Pe criterion or its generalization, the Bayes risk, is employed.

Assume that there are M possible hypotheses { 0, 1, …, M-1} to decide from with a cost

Cijassigned to the decision for choosing i when j is true. The expected Bayes risk is given

by

for uniform costs assignment (C00 = C11 = 0 and C01 = C10 = 1) we have that = Pe

Let the decision region Ri = {x : decide i} where i = 0, 1, …, M - 1. These Ri‟s together

partition the space so that

i

As each x must be assigned to one and only one of the decision regions Ri‟s. The cost

contribution to if x is assigned to R1, say for example, is C1(x)p(x)dx. Then for

assigning xto R2 is C2(x)p(x)dx and so on. Generalizing that, we should assign x to Rk if

is minimum for i = k.

Hence, we should choose the hypothesis that minimizes

over i = 0, 1,…,M - 1.

To determine the decision rule that minimizes Pe we use the uniform costs, then

Since the first term is independent of i, the cost Ci(x) is minimized by maximixing P( i |x).

Thus, the minimum Pe decision rule is to decide k if

This is the M-ary maximum a posteriori proabability (MAP) decision rule. In case the prior

probability of each of the hypotheses is equal, then to maximize P( i|x) one need only to

p(x| i). Hence the decision rule for equal prior probabilities decides k if

8.6.2 Example

A ternary communication system transmits one of the three amplitude signals {1, 2, 3} with

equal probabilities. The independent received signal samples under each hypothesis are

where the additive noise w[n] is Gaussian with zero mean and variance ζ². The cost are

Cii = 0 and Cij = 1 for i≠j and i, j = 1, 2, 3. Determine the decision regions and the minimum

probability of error Pe.

Figure 8.7: Decision regions for multiple DC signals in while Gaussian noise for N = 1 case

In this problem, as the prior probabilities are equal P( 0) = P( 1) = P( 3) = 1∕3, the ML

decision rule applies. First consider the simple case of N = 1, the PDF of three hypotheses are

shown in Figure 8.7 . By symmetry, it is obvious that as per the ML rule to minimize Pe we

should decide 0 if y[0] < 1.5, 1 if 1.5 < y[0] < 2.5, and 2 if y[0] > 2.5.

For multiple samples (N > 1) case, the multivariate PDFs as well as the decision regions are

not feasible to plot. In this case we need to derive a test statistic and for doing the same note

that the conditional PDF of the hypotheses can be compactly written as

i

Further using as the mean of observations y[n]‟s and on manipulation, we can express Di²

as

It is apparent that to minimize Di² we need to choose i for which Ai is closest to .

Hence, we decide

To detemine the minimum Pe, let us note that in this case there are six types of errors unlike

that in binary case. In general for an M-ary detection problem there are M² - M = M(M - 1)

error types. It is therefore easiler to determine 1 - Pe = Pc , where Pc is the probability of

correct decisions. Thus

As conditioned on , we have

so that

8.7.1 Composite Hypothesis Testing

The general class of hypothesis testing problems that we will be interested in the composite

hypothesis test. As opposed to the simple hypothesis test in which the PDFs under both

hypotheses are completely known, the composite hypothesis test must accommodate

unknown parameters. The PDFs under 0 or under 1 or both hypotheses may not be

amplitude A in WGN, then under 1 the PDF is:

Since the amplitude A is unknown, the PDF is not completely specified so we cannot directly

perform the LRT tests as discussed earlier.

There are two approaches to composite hypothesis testing:

1. Bayesian approach: In this approach, the unknown parameter considered as realizations of random

variables and assigned a prior PDF.

2. Generalized likelihood ratio test: In this approach, the unknown parameters are first estimated and

then used in a likelihood ratio test.

The general problem is to decide between 0 and 1 when the PDFs depend on different

sets of unknown parameters. These parameters may or may not be the same under each

hypothesis. Under 0 assume that vector parameters θ0 is unknown while under 1 assume

that vector parameters θ1 is unknown.

8.7.2 Bayesian Approach for Composite Hypothesis Testing

The Bayesian approach assigns prior PDFs to θ0 and θ1. In doing so it models the unknown

parameters as realization of a vector random variable. If the prior PDFs are denoted by p(θ0)

and p(θ1), respectively, the PDFs of the data are:

The unconditional PDFs p(x; 0) and p(x; 1) are now completely specified. They no

longer dependent on the unknown parameters. With the Bayesian approach the optimal NP

detector decides 1 if,

Remark: In this approach the required integrations are multidimensional with dimension

equal to the unknown parameter dimension. The choice of prior PDFs can also prove to be

difficult. If indeed some prior knowledge is available, then it should be used. If not, one can

used non-informative prior i.e., the one having PDF as „flat‟ as possible.

8.7.3 Example

Detection of the unknown DC level in WGN: Bayesian approach

where the DC level in WGN A in unknown and can take on any value -∞ < A < ∞ and w[n] is

WGN with variance ζ².

To solve this problem using Bayesian approach, we assign a prior A ~ (0,ζA²) where A is

independent of the noise w[n]. The conditional PDF under 1 is given by

The NP detector decides 1 if,

But

Letting

so we have

On taking logarithm of both sides and retaining only the data dependent terms we decide

1if

or

Remark: Note the form of the detector. As the unknown DC level can either be positive or

negative so the detector is formed by comparing either the square or the absolute value of the

sufficient statistic with an appropriate threshold.

The Bayesian approach to composite hypothesis testing discussed earlier suffers from

following limitations:

In some cases, it is not obvious how to assign the prior probability of the hypotheses.

If both the hypotheses contain unknown parameters, finding the Bayesian solution becomes very

tedious and often the involved integrals do not yield closed-form solution.

On account of the above limitations, one can use an alternative hypothesis testing approach

referred to as generalized likelihood ratio test (GLRT) and is presented in the following.

8.8.1 Generalized Likelihood Ratio Test (GLRT)

In this approach, the unknown parameters are first estimated from the observed data under

either or both the hypotheses. In the GLRT, the unknown parameters are replaced by their

maximum likelihood estimate (MLE) in the likelihood ratio. Although there is no optimality

associated with the GLRT, in practice, it appears to work quite well. In general a GLRT

decides 1 if:

where 1 is the MLE of θ1 assuming 1 is true (i.e., maximizes p(x; 1)), and 0 is the

MLE of θ0 assuming 0 is true (i.e., maximizes p(x; 0)). This approach also provides

information about the unknown parameters since the first step in determining LG(x) is to find

the MLEs.

20 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

8.7.3 Example

Detection of the unknown DC level in WGN: GLRT approach

Assume θ1 = A and there is no unknown parameters under 0. The hypothesis test becomes,

or GLRT decides 1 if

Remark: Note the form of the detector is the identical to that has been obtained using the

Bayesian approach. The derivation of the detector using GLRT often turns out to be much

simpler than that of Bayesian approach.

The salient attributes of the different parametric detection approaches discussed in this

module are summaried as below:

The Bayesian detector is the most general detector which not only accounts for the prior probability of

the hypotheses but also the costs of each of the decisions made. It minimizes the average cost or risk

in a decision making.

The minimax detector is a variant of the Bayesian detector which accounts for the uncertainity about

choosing the prior probability. It minimizes the worst case average cost in a decision making.

The Nayman-Pearson (NP) detector is the well-known classical detector which optimizes the

detection probability given a fixed level of false alarm probability. It is based on the observed data

only and does not require any prior information about the hypotheses. As a result of this it can be

applied in alomost all detection problems.

In contrast to simple hypothesis testing where the PDFs under all hypotheses is completely known,

the composite hypothesis testing where one or more hypothses have unknown parameter(s) is much

more challenging.

In the Bayesian approach to composite hypothesis testing, the likelihood ratio test is performed by

integrating out the unknown parameter(s) using its prior PDF. This approach yields optimal detector

but in general it does not lead to closed-form derivation of the detector.

The generalized likelihood ratio test (GLRT) forms the most commonly used approach for the

composite hypothesis testing. In this appraoch, the unknown parameter is substituted with its the

maximum likelihood estimate (MLE) obtained from the observed data to perform the likelihood ratio

test. The GLRT approach is very effective in practice though the optimality of the resulting detector is

not guaranteed in all cases.

Lecture 31 : Non-Parametric Detection: Sign Detector

9.2.1 Sign Detector

We already known that for detection of fixed positive voltage in presence of zero mean

additive Gaussian noise, the optimal detector has the form

Instead of summing the observations and comparing it with a threshold, if we count the

number of times the observation samples exceed zero then this could also be used to detect

the unknown fixed positive voltage level. In this case, the detector can be given as

and γu denotes an appropriately chosen threshold. Such a detector essentially counts the

number of positive signs in the observations so is terned as the sign detector. A sign detector

is very simple to implement as requiring a hard limiter followed by an adder. Unlike that of

sample mean detector, the performance analysis of sign detector is rather involved and is

undertaken in the following.

Assuming that the probabilities of the observation samples taking positive values under two

hypotheses are given as

where p is the probability of the observed data sample being positive given a fixed positive

voltage A is transmitted.

Let d(n) denote the sign of x[n] by

For the sign detector, the likelihood ratio test (LRT) for deciding 1 can be given as

where

Let N+ denote the number of positive observations. Then LRT can be expressed as

Further on taking the logarithm to the base (p∕(1 - p)), we can express the LRT in more useful

form as

22 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Note that N+ is the sum of Bernoulli distributed random variables. Under the hypothesis 0,

the binomial distribution N+ has parameters N and 0.5, i.e.,

Under the hypothesis 1, then the binomial distribution N+ has parameters N and p, i.e.,

9.2.2 Example

Derive a sign detector that uses nine observations and ensures a probability of false alarm

probability of 0.1 for detecting a positive signal A in presence of zero mean Gaussian noise

and analyze its performance for probability of data being positive (i) 0.75 and (ii) 0.99.

Given N = 9, the detection problem is,

or

Since

so

while

(i) For p = 0.75, the probability of detection PD can be computed as

9.3.1 Sequential Detection

The sequential likelihood ratio detection is a modified Neyman-Pearson (NP) detection in

which two thresholds are established. Testing is done until one of the two thresholds is

crossed. This modified NP test is characterized by fixing the probability of miss in addition to

the probability of false alarm. In the modified NP test (also called the sequential likelihood

ratio test), the likelihood ratio (LR) is compared at every update time (i.e., at each new

sequential observation point) with two thresholds. These thresholds denoted by η0 and η1 are

determined by specifying a fixed value of α to PFA and a fixed value of β to PM.

The decision rule is as follows. If the LR is larger than η1, we decide on 1, while if the LR

is smaller than η0, we decide on 0. If either threshold is crossed, the test is stopped and

appropriate decision is taken. If the LR falls between the two thresholds then the decision is

deferred, that is another sample is taken and the test is repeated.

Given independent and identically distributed observations denoted by xN = {x1, x2,…, xN},

the LRT can be written as

or

so we can have a recursive arrangement for the LRT with initial condition Λ(x1) = Λ(x1).

For a fixed value of PFA and PM, the thresholds η0 and η1 need to be derived that meet the

constraints:

Similarly,

Hence,

To summarize, the sequential LRT detector decides 1 to be true if Λ(x N) > η1, and if

Λ(xN) < η0 then it decides 0 to be true. If Λ(xN) > η0 but smaller than η1, then another

samples is taken to form Λ(xN+1) and the test is repeated.

This kind of test allows the user to terminate the test earlier than the conventional NP test,

once the presence or absence of a target has been determined with an acceptable level of error

(PFA or PM).

Remarks: The salient limitations of this approach-

Samples are assumed to be IID

PF and PM are assumed to be constant

9.3.2 Example

Consider the detection of DC level A in additive Gaussian noise with zero mean and variance

ζ². Conduct a sequential likelihood ratio test (SLRT) to detect the presence and absence of

the signal. It is desired to terminate the test when PM ≤ β or PFA ≤ α.

The detection problem is,

We know that thresholds η1 and η0 for deciding the hypotheses 1 and 0, respectively can

be given as

The outcome of the LRT versus the number of samples considered are shown in Figure 9.1 .

Note the case for N = 1, 2,…, 7 when the decision is deferred and another sample is required

to be taken since neither of the thresholds are crossed. When N = 8, the upper threshold is

crossed, thus allowing to make the decision that the signal present hypothesis ( 1) being

true.

Figure 9.1: Likelihood ratio test versus the number of samples considered (Note that the test

terminates at n = 8).

Outline

The problems of the detection of known signals in presence of white Gaussian noise (WGN)

find extensive use in different signal processing applications in particularly where the

assumption about known signal characteristics is a valid one. These problems can be grouped

into two broad classes depending on the nature of the signal. When the signals are known

deterministic it can be shown that the detector takes the form of a replica-correlator or

a matched filter. The salient topics discussed this module are:

Replica-correlator detector

Matched filter detector

Generalized matched filter detector

Lecture 33 : Replica-Correlator Detector

10.1.1 Replica-Correlator Detector

The replica-correlator detector detects known deterministic signal in presence of white

Gaussian noise. In this, we derive the NP detector for the known deterministic signal case.

The two hypotheses are

where the signal s[n] is assumed known determinstic signal and the noise w[n] is WGN with

variance ζ².

Recall that the NP detection decides 1 if

Since the PDF under the hypotheses can be given as

Thus we decide 1 if

26 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.

It is clear that the received data is correlated with the signal replica and therefore it is often

referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-

correlator detector.

Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise

10.1.2 Matched Filter Detector

In this we show that the replica-correlator detector can interpreted as doing finite impulse

response (FIR) filtering on the data. Assume that x[n] is the input to a FIR filter with impulse

response h[n] where h[n] is nonzero for n = 0, 1,…,N - 1. The output of the filter at time n ≥ 0

is given by

If we consider the impulse response of the FIR filter to be a “flipped-around” version of the

signal to be detected or

then

Thus we decide 1 if

where T(x) is test statistic and γ′ a threshold which is chosen to satisfy PFA = α for a given α.

It is clear that the received data is correlated with the signal replica and therefore it is often

referred to replica-correlator detector. Figure 10.1 shows the block diagram of the replica-

correlator detector.

Figure 10.1: Replica-correlator detector for deterministic signal in white Gaussian noise

Lecture 34 : Properties of Matched Filter

10.2.1 Frequency-domain Interpretation of Matched Filter

We may also be view the matched filter in frequency domain, using Parseval‟s theorem the

replica-correlator can be expressed as

where H(f) and X(f) are the discrete-time Fourier transforms of h[n] and x[n], respectively.

From matched filter interpretation, H(f) = {h[n]} = {s[N - 1 - n]}, where {.}

represents the discrete-time Fourier transform. So the filter Fourier transform H(f) can be

shown to be

then we have

There is another property of matched filter that it maximizes the signal to noise ratio (SNR) at

the output of the filter. To show this we consider all detectors of the form of FIR filter but

with arbitrary impulse response h[n] over [0,N - 1] and zero otherwise. Now if we define the

output SNR η as

then using the vector notation we can write

or equivalently

Remark: Note that for the detection of a known signal in WGN, the NP criterion and the

maximum SNR criterion both lead to the matched filter detector. The maximum SNR is

ηmax = sTs∕ζ² = ε∕ζ² , where ε is the energy of the signal. One can easily guess that the

performance of the matched filter detector would increase monotonically with ηmax.

Lecture 35 : Computation of Performance

10.3.1 Performance of Matched Filter Detector

To determine the detection performance of the replica-correlator or matched filter detection,

we need to derive the expression of probability of the detection PD for a given value of

probability of false alarm PFA. For replica-correlation or matched filter detector we decide

1 if

As under each hypothesis the data samples x[n] are Gaussian and since test statics T(x) is a

linear combination of Gaussian random variables, T(x) is also Gaussian. Let E(T; i) and

var(T; i) denote the expected value and the variance of T(x) under i, then

where we have used the fact that w[n]‟s are uncorrelated. Thus

It is to note that as ε∕ζ² increases the PDFs retain their shape but move further apart, this

obviously increases the detection performance. As the PDFs under both hypotheses are

known, we can find the PFA and PD as

where Q(x) = 1 - ϕ(x) and ϕ(x) is CDF of standard normal distribution, (0, 1). Since CDF

is monotonically increase so Q(x) must be monotonically decreasing and so is Q-1(x). On

substituting into the expression for PD, we can write

The above relation establishes that PD monotonically increases with increase in ε∕ζ² i.e., the

energy-to-noise ratio.

10.3.2 Example

It is desired to design a signal for achieving the best detection performance in white Gaussian

noise. Two competing signals proposed are:

where A > 0. Which one of the signals would yield the better detection performance?

The two hypotheses are,

where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and the

noise w[n] is WGN with variance ζ².

Note that in case of detection of the known deterministic signal in WGN with variance ζ², the

detection performance of a NP detector is completely characterised by the signal-to-noise

ratio as

On computing the energies for the signals s1[n] and s2[n], we have

As both the signals have identical energy, they would yield the identical detection

performance under WGN.

10.4.1 Generalized Matched Filter Detector

In many practical situations, the noise is more accurately modeled as correlated noise. The

noise is assumed to have the PDF w ~ (0,C), where C is the covariance matrix. If the noise

is modeled as wide sensed stationary (WSS) process then C has a special form of symmetric

Toeplitz matrix. But for non-stationary noise C will be an arbitrary covariance matrix. For

finding the NP detector in this case we again perform the likelihood ratio test with the PDF of

the data x under 1 as (s,C) and under 0 as (0,C). The NP detector decides 1 if

On simplifying and incorporating the data dependent terms into threshold, we decide 1 if

This is referred to as a generalized matched filter. Note that for WGN, C = ζ²I, the detector

reduces to

Further it may be viewed as a replica-correlator where the replica is the modified signal s′= C-

1

s, then

thus the detector correlates the data with modified signal.

For any C that is positive definite, its inverse would also be positive definite. Consequently

we may factor C-1 as C-1 = DT D, where D is a nonsingular matrix. Thus the test statistic can

be expressed as

Considering that the fact the with the linear transformation of data the correlated noise also

undergoes the same linear transformation and gets transformed to w′ = Dw. Then

Thus the linear transformation D is a whitening transform and the generalized matched filter

can also be viewed as a pre-whitener followed by a replica-correlator or matched filter as

shown in Figure 10.3

correlator (or matched filter)

10.4.2 Performance of Generalized Matched Filter

The test statistic for the generalized matched filter is given by

is a linear transformation of the data x so the PDF of the test statistic under either of the

hypothesis remain Gaussian same as that of data. The first two moments of the test statistic

under either of the hypothesis are determined below:

so we have

It is to note that in case of the correlated noise the signal can be designed to maximize sT C-

1

sand hence PD unlike that of white noise case in which the shape of the signal has no

importance only the signal energy mattered.

10.5.1 Signal Design for Correlated Noise

In this we explain the design of signal for optimal detection performance under correlated

noise case. Consider an arbitrary noise covariance matrix C and signal s to be designed for

optimal detection. The detection performance can be optimized by maximizing the term sC-

s by arbitrarily increasing the signal strength but in practice there is constraint on signal

1 T

energy. So the optimal signal is chosen by maximizing sC-1sT subject to the fixed energy

constraint sTs = ε. Making use of Lagrangian multiplier, we maximize the function

Since C is a symmetric matrix, we have

or

Therefore,

Thus, we should choose the signal s as the eigenvector of C-1 whose corresponding

eigenvalue, i.e., λ is maximum. Alternatively, we should choose the signal as the eigenvector

of C that has the minimum eigenvalue.

10.5.2 Example

It is desired to design a signal for achieving the best detection performance in colored WSS

Gaussian noise with auto covariance function (ACF), rww[k] = P + ζ²δ[k] where both P and ζ

are non-zero positive constants. Two competing signals proposed are:

where A > 0. Which one of the signals would yield the better detection performance?

The two hypotheses are:

where the signal si[n] denotes either of the given deterministic signals s1[n] or s2[n] and w[n]

is colored WSS Gaussian noise with ACF, rww[k] = P + ζ²δ[k] or the covariance matrix C =

P11T + ζ²I, where 1 denotes N × 1 column vector of ones and I is N × N identity matrix.

Note that in case of detection of the known deterministic signal in corelated noise, the

detection performance of a NP detector is completely characterised by the signal-to-noise

ratio (SNR) as,

where the term sTC-1s denotes the SNR. Thus a signal that yields higher SNR would results in

a better detection performance for a given probability of false alarm.

To compute the resulting SNR for the given signals, we have to first compute the inverse of

noise covariance matrix. For the sake of simplity in finding the inverse, without loss of

generality, assume that the length of observation is either N = 2 (even case) or N = 3 (odd

case).

Even data length case: N = 2

Odd data length case: N = 3

Again, for A,P,σ > 0, we have s2TC-1s2 > s1TC-1s1.

Thus, the signal s2[n] yields better detector performance than the signal s1[n] in given colored

WSS Gaussian noise. Constrast this inference with the one made in white Gaussian noise

case, refer to Example 10.3.2 , where either of the signals yields the same detection

performance.

Lecture 38 : Detection and Linear Model

10.6.1 Linear Model

The linear model was introduced earlier in context of classical estimation in Section 2.6.1. It

finds application in a number of real-world problems and makes the detection/estimation

problems mathematically tractable. Recall that in the classical general linear model, the data

vector x can be expressed as

x = Hθ + w

where x is a N × 1 vector of received data samples, H is a known N × p full rank observation

matrix with N > p, θ is a p × 1 vector of parameter, which may or may not be known,

and w is a N × 1 noise vector with PDF (0,C). The term Hθ can be interpreted as the

signal.

In case of the detection of the deterministic signals, θ is assumed to be known under 1 with

the value say θ1, then s = Hθ1 is the known signal. Under the null hypothesis 0, we

have θ = 0 so that no signal is present. In applying the linear model to the detection problems,

we decide whether the signal s = Hθ1 is present or not. The detection problem can be

mathematically expressed as

The NP detector immediately follows by letting s = Hθ1 in the detector T(x) = sC-1sT > γ′, i.e.,

we decide 1 if

expression .

Further we could generalize the above results by noting that for the general linear model the

minimum variance unbiased (MVU) estimator of θ is

Thus, we decide 1 if

34 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Remark: As pointed out earlier, the quantity θ1TC-1θ1 can be interpreted as the signal-to-noise

ratio (SNR). Thus the detection performance monotically increases with increasing SNR.

10.6.2 Example

It is desired to detect the known signal s[n] = Arn for n = 0,…,N - 1 in white Gaussian noise

with variance ζ². Find the Neyman-Pearson detector and its detection performance. Explain

what happens as N →∞ for 0 < r < 1, r = 1, and r > 1.

The two hypotheses are

known determinstic signal vector and w is a N × 1 noise vector with PDF (0,ζ²I).

The NP detection decides 1 if

where γ is a threshold computed from the constraint on probability of false alarm, PFA.

The PDF under the hypotheses can be given as

with .

The test statistics T(x) being linear function of data, assuming that the signal and the noise

are uncorrelated, the PDFs of T(x) under 0 and 1 could be shown as (0,ζ²||s||²) and

(||s||²,ζ²||s||²), respectively.

Finding the probability of false alarm and applying the given constraint (say α)

Noting that the detection threshold satisfies the constraint with equality,

or

infinity. As a result the argument of Q-function in the expression of PD tends to infinity and

hence PD → 1.

For 0 < r < 1, as N →∞ we have . Hence

The probability of detection improves with increasing A, decreasing ζ, and/or r being closer

to 1.

Outline

11.1 Outline

In some cases, it is more appropriate to assume the signal as a random process rather than

being deterministic. When the signals are modeled as random processes with known

covariance structure, the detector takes the form of a estimator-correlator to cope with the

random nature of the signals to be detected. The salient topics discussed this module are:

Energy detector

Estimator-correlator detector

Generalized Gaussian detection

Lecture 39 : Energy Detector

11.1.1 Energy Detector

We first derive the NP detector in the presence of WGN for the cases where the signal is

modeled as white WSS Gaussian random process with known variance. Later it is generalized

for the cases where the signal is modeled as a Gaussian random process with known arbitrary

covariance matrix.

11.1.2 Detection of Random Signal with Diagonal Covariance Matrix

The detection problem is to differentiate between the hypotheses

where the signal s[n] is zero mean white WSS Gaussian random process with variance ζs² and

the noise w[n] is WGN with variance ζ². For these modeling assumptions, x ~ (0,ζ²I) under

0 and x ~ (0, (ζs² + ζ²)I) under 1. So the NP detector decides 1 if

where

where

Therefore, the NP detector basically computes the energy of the received signal and compares

it to a predetermined threshold. Hence it referred to as the energy detector.

For computing the detection performance of the energy detector, note that the test statistic is

the sum of the squares of N IID Gaussian random variable, so the PDF of the test statistic

under both hypotheses can be given as

On using the definition for the right-tailed probability for a χν² random

variable, we can find the PFA and PD as

As done earlier, we can substitute the value of threshold γ′ determined from PFA expression

into PD expression to get

Thus with the increase in ζs²∕ζ², the argument of the Qχ ² function decreases and the detection

N

Remark: The energy detector forms the most widely used receivers in the communication

systems, in particular, for the asynchronous communication.

11.1.3 Detection of Random Signal with Arbitrary Covariance Matrix

In this case, the signal s[n] is a zero mean Gaussian random process with known covariance

matrix Cs and the noise is WGN with the variance ζ². For these modeling assumptions,

x ~ (0,ζ²I) under 0 andx ~ (0,Cs + ζ²I) under 1. So the NP detector decides 1 if

where

or

so that

Now, let

Hence, we decide 1 if

Note that NP detector correlates the received data with an estimate of the signal i.e., . It is

therefore termed as estimator-correlator detector. Recall that if θ is an unknown random

variable whose realizations are to be estimated based on the data x where θ and x are jointly

Gaussian with zero mean, then the MMSE estimator is given by

where Cθx = E(θxT) and Cxx = E(xxT). In the context of detection problem, we have θ = s and

x = s + w with s and w uncorrelated. The MMSE estimate of the signal realization can be

given as

Thus it can be argued that the signal estimate is Wiener filter estimate of the given

realization of the random signal. The block diagram of the estimator-correlator is shown in

Figure 11.1.

Figure 11.1: Estimator-correlator detector for detection of Gaussian random signal in white

Gaussian noise

11.2.1 Linear Model

The linear model for the detection of deterministic signal in WGN was discussed

in Section 10.6.1, the detection of random signals in WGN could also be simplified with the

use of linear model. This approach has obvious similarity to the Bayesian linear model

described in Section 7.3.1

Assume that the data is described as:

x = Hθ + w

where x is N × 1 data vector, H is known N × p observation matrix, θ is p × 1 random vector

of parameters with θ ~ (0, Cθ), and w is an N × 1 noise vector with w ~ (0, ζ²I) and

independent of θ.

The detection problem can be expressed as

By noting that s = Hθ ~ (0, HCθHT) and using the previous results for the NP detector in

random signal case, i.e., the estimator-correlation detector decides 1 if

On substituting the value of the signal covariance matrix as Cs = HCθHT , the detector

decides 1 if

of θ.

11.2.2 Generalized Gaussian Detection

In general case the signal is allowed to have both deterministic and random components. To

accommodate this the signal is modeled as a random process with the deterministic part

corresponding to a nonzero mean mean and the random part part corresponding to a zero

mean random process with a given signal covariance matrix. These assumptions lead to the

general Gaussian detection problem in which the signal can be discriminated from the noise

based on its mean and covariance. Mathematically the detection problem is described as

where s ~ (μs, Cs) w ~ (0, Cw) and s and w are independent. The NP detector decides

1 if

or

Taking logarithm of both sides, retaining only data-dependent terms and scaling produces the

test statistic

Thus the test statics consists of both linear form and quadratic form in the data x. Consider

the following special cases:

1. Cs = 0 or a deterministic signal with s = μs. Then

2. μs = 0 or a random signal with s ~ (0, Cs). Then

is the MMSE estimate of the random signal s.

11.2.3 Example

Detection of a sinusoidal signal in additive white Gaussian noise with Rayleigh fading

channel model.

In typical detection scenario, the signal of interest can reach the detector/receiver following

many different paths. The net effect of this is to cause constructive and distructive

interference, resulting in an unpredictable amplitude and phase of the received signal. Such a

fluctuation in received signal can also be casued due to relative motion between the

transmitter and the receiver.

In case of Rayleigh fading, the observed signal can be expressed as

where A and ϕ are random variables, ƒ0 is known frequncy within the range 0 < ƒ0 < 0.5 and

w[n] is WGN with variance ζ².

Instead of assigning PDF to A and ϕ, it is more convenient to note that

Note that now the signal is linear in the parameters p and q. Now invoking the central limit

theorem (due to the superposition over number of multipath arrivals of the signal at the

receiver), we further assume that

On computing the first two moments of s[n] we have

and

Thus s[n] is a WSS Gaussian random process with ACF rss(k) = ζs² cos 2πƒ0k. In addition we

can show that the PDF of is Rayleigh or

and the PDf of ϕ = arctan(-q/p) is (0, 2π) and A and ϕ are independent of each other. For

the amplitude PDF being Rayleigh distributed, this channel model is referred to as Rayleigh

fading channel model.

With these assumptions, we note that the observed data follows the Bayesian linear model

x = Hθ + w, where

For the detection of sinusoid (i.e., deciding 1), the NP detector can be given as,

On using the matrix inversion lemma, we have

Further noting that for large N and 0 < ƒ0 < 0.5, we have HT H ≈ (N∕2)I. Thus

On merging the positive constant k with the threshold, we have

or

From the above relations, we can implement the detector in two ways.

In one implementation, the data is correlated with the cosine (“in phase”) and sine (“in quadrature”)

replicas of the sinusoidal signal. As the phase is random, one or both of these outputs (I or Q) will be

large in magnitude if the signal is present. Since the sign of the correlator output can be positive ot

negative, we square the I and Q outputs, sum them with scaling by 1∕N and then compare to a

threshold. This type of detector is known as a quadrature matched filter or an incoherent matched

filter.

The second implementation is known as a periodogram detector or a sampled spectrum detector. In

this, the Fourier transform of data x[n] is computed, which is magnitude-squared and scaled by

1∕N and then compared with a threshold

2.1 Outline

In this module the basics of classical parameter estimation are discussed and the bounds on

the unbiased estimator are described. The salient topics discussed this module are:

Minimum variance unbiased estimator (MVUE)

Cramer-Rao lower bound (CRLB) on unbiased estimators

Fisher information and its relation to CRLB

Computation of CRLB in general cases

Linear model of data and its generalization

To state the problem of classical parameter estimation mathematically, let us define the

following:

x[n] ≡ observation data at sample time n

x = [x[0] x[1] … x[N - 1]]T ≡ vector of N observation samples (N-point data set)

p(x,θ) ≡ mathematical model (i.e., PDF) of the N-point data set parameterized by θ

The problem is to find a function of the N-point data set which provides an estimate of θ, that

is

Once a candidate estimator function g(x) is found, then one usually asks the following

questions:

1. How close will be to θ (i.e., how good or optimal is the estimator)?

2. Are there better estimators (i.e., closer to the value to be estimated) ?

To measure the goodness of an estimator, one need to define a suitable cost function C(θ, )

which essintially captures the difference between the estimated and the true value of the

parameter over the range of interest. The typical cost factors used are thequadratic error,

the absolute error and the uniform (Hit-or-Miss) cost functions.

In classical (nonrandom) parameter estimation case, the natural optimazation criterion is

minimization of the mean square error:

But often this criterion does not yield a realizable estimator, i.e., the one which can be written

as functions of the data only:

However although [E( ) - θ]² is a function of θ, the variance of the estimator var( ) is only

a function of data. Thus an alternative approach is to assume E( ) - θ = 0 (i.e., bias is zero)

and minimize var( ). This produces the minimum variance unbiased estimator (MVUE).

2.2.1 Minimum Variance Unbiased Estimator (MVUE)

The MVUE is an optimal estimator. The two attributes of this optimal estimator are:

It should be unbiased:

This ensures that the estimator produces, in an average sense, the true value of the parameter

to be estimated.

This ensures that the estimator, in expected sense, deviates from the true value of the

parameter minimally among all possible unbiased estimators for the problem.

2.2.2 Example

Consider a DC signal, A, in presence of additive white Gaussian noise (WGN)

w[n] with variance ζ², then the observed data x[n] is given by:

Consider the sample-mean estimator function:

Check unbiasedness:

Find variance:

The sample-mean estimator is turned out as a unbiased and having a variance var( ) =

. But it is not ovbious whether it is MVUE or not. If is an MVUE, all other unbiased

estimator functions would have their variance as var( )≥

2.2.3 Existence of MVUE

The MVUE is the most desired estimator in any case but they do not exist always.

Figure 2.1 depicts two possible situation for the variance var( ) of an unbiased

estimator for the parameter θ. Let there be only three unbiased estimators that exist and

whose variances are shown in Figure 2.1(a), then clearly is the MVUE. If the situation

shown in Figure 2.1(b) exists, then there is no MVUE since for θ > θo, is better while

for θ < θo, is better. In the former case is some times referred to as uniformally

minimum variance unbiased estimator to emphasized the fact that it has the smallest variance

for all θ. In general the MVUE does not always exist.

2.2.4 Example

In this we present a counterexample to the existence of the MVUE. If the form of the data

PDF changes with the parameter θ, then it would be expected that the best estimator would

also change with θ. Assume that we have two independent observations x[0] and x[1] with

PDF:

It can be easily shown that they are unbiased estimators and to compute their variances we

have

so that

and

For θ ≤ 0 the minimum possible variance of an unbiased estimator is 18∕36, while that for θ <

0 is 24∕36. Clearly between these two estimators no MVU estimator exists.

As shown in Figure 2.2, for θ ≥ 0 the minimum possible variance of an unbiased estimator is

18∕36, while that for θ < 0 is 28∕36. Clearly none of these two estimators could be the MVUE.

In this lecture, we present the Cramer Rao lower bound on the variance of an unbiased

estimator.

2.3.1 Cramer Rao Lower Bound (CRLB)

The variance of any unbiased estimator under certain regularity conditions, must be lower

bounded by the CRLB, with the variance of the MVUE attaining the CRLB. That is:

and

Furthermore if, for some function g amd I:

then we can find the MVUE as: = g(x) and the minimum, variance is . The proof

of these results are given later in Section 2.4.1

For p-dimensional parameter, θ, the equivalent condition in terms of the covariance matrix is

given by:

44 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

i.e., C - I-1(θ) is positive semi-definite where C = E[( -E( ))(( -E( ))T ] is the

covariance matrix of the estimator. The Fisher matrix, I(θ), is given as:

then we can find the MVUE as: = g(x) and the minimum covariance is I-1(θ).

2.3.2 MVU Estimator and CRLB Attainment

In general, an MVU estimator may exist but may not attain the CRLB. To illustrate this, let

us assume that there exist three unbiased estimators for estimating the unknown parameter θ

in an estimation problem and their variances are shown in Figure 2.3 . As shown in

Figure 2.3 (a), the estimator 3 is the efficient as it attains the CRLB and therefore it is also

MVUE. On the other hand, in Figure 2.3 (b), the estimator 3 does not attain the CRLB so it

is not an efficient. But its variance is uniformally less that other possible unbiased estimators

so it is the MVUE.

2.3.3 Fisher Information

As noted above, when CRLB is attained, the variance of the unbiased estimator is reciprocal

of the Fisher information. The Fisher information is a way of measuring the amount of

information that an observable random variable x carries about an unknown parameters θ

upon which the probability of x depends. Assume that the data PDF p(x; θ) satisfies some

regulairty conditions which include:

For all x such that p(x; θ) > 0, ln p(x; θ) exists and is finite.

The operation of the integration with respect to x and the differentiation with respect to θ

could be exchanged in finding the expectation, i.e.,

Note that the above regularity conditions are satisfied in general except when the domain of

the PDF for which it is nonzero dependes on the unknown parameter (e.g., uniform

distribution (0,θ) with unknown domain parameter).

Given the PDF p(x; θ), the Fisher information I(θ) can also be expressed as

which follows directly from the “regularity” condition, E = 0 ∀ θ , imposed on

the PDF.

Proof

In the following, the Fisher information relationships are shown for the scalar parameter case

p(x; θ) for the sake of simplicity,

or

The Fisher information has the essential properties of an information measure and that is

obvious by noting the facts:

2. It is additive for independent observations

The later property leads to the result that the CRLB of N IID observations is 1∕N times that of

for one observation. To verify this, note that for independent observations

This results in

of N observations remains same as that of for one observation and the CRLB will not

decrease with increasing data record length.

In other words, if we synthetically try to increase the data length by simply repeating some of

the actual observations rather than making new observation it would not result in lowering of

the CRLB or better estimation performance than obtained by using the actual observations.

2.3.4 Example

Consider the case of DC signal A embedded in noise. The noisy observation is given by:

where p(x; θ) is considered a function of the parameter θ = A (for known x) and is thus

termed as likelihood function. Taking the first and then second derivatives of logarithm of the

likelihood function:

Note the second derivative turns out to be a constant, thus the CRLB is

where I(θ) = and g(x) = = . Shown earlier that for estimation of DC level

in WGN, the sample-mean estimator has a variance of . Thus it is indeed an MVUE as its

variance achieves the minimum possible variance bound of .

Another desirable property of estimators is consistency. If we collect a large number of

observations, we hope that we have a lot of information about any unknown parameter θ, and

thus we can construct an estimator with a very small mean square error (MSE). An estimator

is defined as consistent if

which means that as the number of observations increase the MSE of the estimator descends

to zero, i.e., = θ.

For an example, if , then the MSE of is 1∕n. Since limn (1∕n) =

0, x is a consistent estimator of θ or more specifically “MSE-consistant”. There are other type

of consistancy definitions that look at the probability of the errors. They work better when the

estimator do not have a variance.

2.4.1 CRLB in General Cases

In practice, we often require the CRLB for a parameter which is function of some more

fundamental parameter. So first we discussed the derivation of the CRLB for transformation

of scalar parameters. Later the derivation of the CRLB for a deterministic signal in white

Gaussian noise case and its extension for general Gaussian case are discussed.

CRLB for Transformed Parameter

Consider a transformed parameter α = g(θ) where the PDF of data is parameterized by θ.

Then CRLB for an estimator of α, under the regularity conditions, is given by

Proof

Effect of Parameter Transformation on Asymptotic Efficiency

It is important to understand the effect of the transformation of parameter on the efficiency of

estimator of the transformed parameter. As shown earlier, the sample mean estimator was

efficient for estimation of DC level A in WGN, considering the transformation

α = g(A) = A² one may expect that be efficient for the estimation of A². But this notion is

not true as is not even unbiased estimator. Since

transformation. The efficiency of an estimator is maintained for linear (or affine)

transformations which can be verified easily.

Assume that an efficient estimator for θ exists and is given that the observed values of lie

in a small interval about = A. Over this small interval, the nonlinear transformation is

approximately linear. If we linearize g about A, we have the approximation

asymptotically efficient.

2.4.2 Example

Consider the problem of the estimation of speed of a vehicle from the elapsed time as

depicted in Figure 2.4. Find the bound on the accuracy of the speed estimation given that the

elapsed time estimation could be measured efficiently.

By measuring the time elapsed (T) to cover the known distance (D) by the vehicle, its speed

can be derived as V = . It is obvious that the accuracy of the measured speed is set by the

accuracy of the elapsed time measurement. But the relationship between the speed and the

elapsed time is non-linear so the efficiency in the time measurement would no longer be

carried over to that in the speed measurement.

The bound of the accuracy of speed estimator,

The speed measurement would be less accurate at higher speed (being quadratic) and be more

accurate for larger distance. Thus, the speed estimator can achieve the CRLB only

asymptotically.

2.5.1 CRLB for Signals in White Gaussian Noise

Assume that a deterministic signal with an unknown parameter θ is observed in WGN as

This form of the bound emphasizes the importance of the signal dependence on θ. It is easy to

note the fact that signals which change rapidly as the unknown parameter changes result in

accurate estimation.

Proof

The likelihood function is

In this a general expression for the CRLB is provided. Assume that

thus both the mean and covariance may depend on θ. Then the Fisher information matrix can

be given by (the proof is lengthy, so omitted)

where

this reduces to

2.5.3 Example

In many fields the observed data is cyclical in nature either due to assumed signal model (in

economics) or due to the underlying physical constraints (in radar and sonar). Thus the signal

processing challenge in those cases gets reduced to estimating the parameters of a sinusoid.

In this example, the determination of the CRLB for the estimation of the ampliture A,

frequency ƒ0, and phase ϕ of a sinusoid embedded in WGN is discussed.

The observed data are

where θ = [A ƒ0 ϕ]T , A > 0, 0 < ƒ0 < 1∕2, and -π ≤ ϕ ≤ π. Since multiple parameters are

unknown, the vector case of CRLB is required to be used and also noting that the covariance

matrix, C = ζ²I, does not depend on θ. Thus we have

For evaluating the CRLB, it is assumed that ƒ0 is not close to 0 or 1∕2 since that allows for

certain simplifications based on approximations as

The Fisher information matrix can be given as

Remark: In practice, the problem of the frequency estimation of a sinusoid is of quite

interest. Note that the CRLB for the frequency decreases as SNR increases. Further that the

bound decreases as 1∕N³, thus making the CRLB quite sensitive to the length of data record.

2.6.1 Linear Model

If N-point samples of data are observed and modeled as:

x = Hθ + w

where

x = N ×1observation vector

H = N × p known observation matrix

θ = p ×1 vector of parameters to be estimated

w = N × 1 noise vector with PDF

then using the CRLB theorem θ = g(x) will be an MVUE if:

into the form I(θ)(g(x) -θ). When we do this the MVU estimator for θ is:

2.6.2 Example

Consider fitting the data, x(t), by a pth order polynomial function of t:

where θis are the polynomial coefficients and w(t) is the approximation error assumed to be

zero mean Gaussian with a constant variance.

Assume we have N samples of data, then:

so x = Hθ + w, where H is N × p matrix:

Hence the MVUE of the polynomial coefficients based on N samples of the data is:

2.6.3 Example

Consider the problem of estimation of the channel in a wireless communication system.

The receiver cannot decode the transmitted symbols correctly unless the channel is known

which is usually unknown in wireless communication. To overcome this issue, the transmitter

regularly transmits a known pseudorandom noise (PN) sequence u[n] so as to enable the

receiver to estimate the unknow channel as depicted in Figure 2.5. The unknown channel is

modeled with a finite impluse response (FIR) filter of an appropriate order, say p.

In practice, the signal received at the receiver is plus some additive noise.

The received data can be modeled as

Assuming that w[n] is white Gaussian noise and noting the data is in the linear model form,

the MVU estimator of the channel impulse response is

For large N and the fact that u[n] = 0 for n < 0 and n > N - 1, it can be shown that

which could be identified as the autocorrelation function of the known sequence u[n]. With

this approaximation HTH also takes the form of a symmetric Toeplitz matrix,

where is the autocorrelation function of u[n]. For PN

sequence, the autocorrelation matrix has the property,

Therefore, the FIR filter coefficient estimators are independent and the MVU estimator

for ith filter coefficient can be given as

Lecture 6 : General Linear Model

2.7.1 General Linear Model

In a general linear model, the two important extensions are:

1. The noise vector, w, is no longer WHITE but has a general Gaussian PDF (0,C)

2. The observed data x, also include the contribution of a known signal vector, s.

Thus the general linear model for the observed data is expressed as:

x = Hθ + s + w

where

s=N× 1 vector of known signal samples

w = N × 1 noise vector with PDF (0,C)

The earlier discussed solution for simple linear model where the noise is assumed to be white

can be used in this after applying a suitable whitening transformation.

If the noise covariance matrix is factored as:

C-1 = DT D

then the matrix D is the required transformation since:

Thus by transforming the general linear model:

x = Hθ + s + w

to:

x' = Dx = DHθ + Ds + Dx

x' = H'θ + s' + w'

equivalently,

x" = x' - s' = H'θ + w'

we can then write the MVU estimator of θ given the observed data x'' as:

That is:

and the covariance matrix is:

2.7.2 Example

Consider the DC level along with the exponenetial signal in coloured Gaussian noise case.

known N × N covariance matrix C.

Using the general linear model, we have

By noting that the data follow the genearl linear model, then MVUE of A can then be given

as

3.1 Outline

In this module the general approaches to find MVUE are discussed. These approaches make

use of the sufficient statistics. Also the best linear unbiased estimator (BLUE) is described

which is much easier to find in practical cases. The salient topics discussed this module are:

Sufficient statistics

Determination of MVUE using a sufficient statistics

Best linear unbiased estimation (BLUE)

3.2.1 Sufficient Statistics

For the cases where the CRLB cannot be established, a more general approach to finding the

MVUE is required. This approach is based on first finding a sufficient statistics for the

unknown parameter θ.

A sufficient statistic with respect to a statistical model (i.e., a PDF) is an statistic which

contains all information about a parameter associated with that model available in sample

observations. No other statistic which can be derived from the same sample observation

provides any additional information that the sufficient statistic does.

Consider the problem of estimating a DC level A in WGN ( (0,ζ²)), we know that sample

mean (Â = ∑n=0N-1x[n]) is a MVUE having minimum variance ζ²/N. On the other hand if we

have chosen Ã = x[0] as our estimator, it is obvious that even though it is unbiased but has

much higher (being ζ²) variance than the minimum variance. This poor performance is due to

throwing away the data samples {x[1],x[2],…,x[N - 1]} which carry information about DC

level A to be estimated. At this point it would be of interest to know which data samples are

pertinent for estimation problem or does there exits a set of data that is sufficient. Note that

following data sets may be claimed to be sufficient for finding Â.

Among them S1 is the original data set. Expectedly it must always be sufficient for the

problem. But other statistics S2, S3 and S4 are also sufficient.

Minimal sufficient statistics: In an estimation problem, there can be a number of sufficient

statistics for a parameter but among them the one which contains the minimum number of

elements is termed as the minimal sufficient statistics.

In estimation problems, it is desirable to use the minimal sufficient statistic as it leads to the

estimator in the most compact form. Formally, to prove that a statistic is a sufficient statistic

one needs to show.

Given the sufficient statistic T(x) = t the conditional PDF p(x|T(x) = t; θ) = p(x|T(x) = t) i.e.,

independent of θ.

Often the determination of this conditional PDF is difficult in practice. Further one need to

first guess a sufficient statistics and then verify its sufficiency. This involves much of

guesswork.

To alleviate the guesswork, a simple procedure termed as “Neyman-Fisher factorization

theorem” can be employed for finding the sufficient statistics for the given parameter.

Neyman-Fisher Factorization Theorem - If the PDF of the data p(x; θ) can be factored into

the form,

p(x; θ) = g(T(x), θ)h(x)

where g(T(x),θ)h(x) is a function of T(x) and θ only and h(x) is a function of x only, then T(x)

is a sufficient statistics for θ.

Conceptually one expects that the PDF after the sufficient statistics has been

observed, p(x|T(x); θ), should not depend on θ since T(x) is sufficient statistics for the

estimation of θ and no more knowledge can be gained about θ once we know T(x).

3.2.2 Example

Let a data x = {x[0],x[1],…,x[N - 1]} be independent and Poisson distributed with mean θ.

Assuming the mean parameter θ is unknown and we want to estimate it.

The pdf of the data can be written as:

By letting

We have the sufficient statistics T(x) = ∑n x[n] which is in fact the minimal sufficient

statisitcs.

3.2.3 Example

Consider the problem of estimation of the phase of a sinusoid embedded in WGN, (0,ζ²).

here, the amplitude A and the frequency ƒ0 of the sinusoid as well as the noise PDF are

assumed to be known.

The PDF of the data vector can be given as

55 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Noting that the exponent can be expanded as

Note in this problen as per the factorization theorem, there is no single sufficient statistics that

exists rather both T1(x) and T2(x) are jointly the sufficient statistics for the estimation of the

phase of the sinusoid.

3.3.1 Determination of MVUE using sufficient statistics

There are two ways in which one may derive the MVUE based on the sufficient

statistics, T(x):

1. Method I: Let be any unbiased estimator of θ. Then

is the MVUE.

2. Method II: Find some function g such that = g(T(x)) is an unbiased estimator of θ, i.e., E(g(T(x))

= θ, then is the MVUE.

The basis of above mentioned approaches of finding the MVUE lies in the Rao-Blackwell-

Lehmann-Scheffe (RBLS) theorem which states that = E( |T(x)) is:

a valid estimator for θ

unbiased

of a variance var( ) less or equal to the variance of , for all θ

the MVUE if the sufficient statistics, T(x) is complete

Note that the sufficient statistics, T(x), is complete if there is only one function g(T(x) that is

unbiased. That is, if h(T(x)) is another unbiased estimator (E(h(T(x))) = θ) then we must

have g= h if T(x) is complete.

The property of completeness of a sufficient statistics depends on its PDF which in turn is

determined by the PDF of data. To validate that a sufficient statistic is complete is in general

quite difficult. But for many practical cases, in particularly, for exponential family of PDFs

the completeness of sufficient statistic holds.

3.3.2 Exponential family of distributions

A study of the properties of probability distributions that have sufficient statistics of the same

dimension as the parameter space regardless of the sample size led to the development of

what is called the exponential family of distributions. The common members of this family

are:

Binomial distribution

56 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

Exponential distribution

Gamma distribution

Goemetric distribution

Normal distribution

Rayleigh distribution

On the other hand, there are some distributions which do not belong to the exponential family

of PDFs. The examples of those are:

Uniform distribution

Cauchy distribution

Weibull (unless shape parameter is known)

Laplace (unless mean parameter is zero)

The one-parameter members of the exponential family have probability density or probability

mass function of the form

where η(θ) and A(θ) are some function of θ, T(x) is function of data x, and h(x) is purely a

function of data, i.e., it does not involve θ.

Suppose that x = {x[0],x[1],…,x[N - 1]} are i.i.d. samples from a member of the exponential

family with the parameter θ, then the joint PDF can be expressed as

From this it is apparent by the factorization theorem that T(x) = ∑n=0N-1T(x[n]) is a sufficient

statistics.

In case of the multi-parameter members of the exponential family, the joint PDF or PMF with

parameter set θ = [θ1,θ2,…,θd]T can be expressed as

3.3.3 Example

Show that the Gamma distribution belongs to the exponential family of distributions.

The Gamma distribution is characterized by the density function

The PDF can be written as

We note that the Gamma distribution has the form of exponential family of distribution

with η1(α, β) = -β, η2(α, β) = (α - 1), T1(x) = x, T2(x) = ln x, A(α, β) = ln Γ(α) - α ln β, and h(x)

= 0.

3.3.4 Example

Consider the previous example of a DC signal embedded in WGN:

Using method 2 we need to find a function g such that E[g(T(x)] = θ = A. Now:

Obviously:

before, is the MVU estimator for θ.

3.4.1 Best Linear Unbiased Estimator (BLUE)

In many estimation problems, the MVUE or a sufficient statistics cannot be found or indeed

the PDF of the data is itself unknown (only the second-order statistics are known in the sense

that they can be estimated from data). In such cases, one solution is to assume a functional

model of the estimator, as being linear in the data, and find the linear estimator which

is unbiased and has minimum variance. This estimator is referred to as the best linear

unbiased estimator (BLUE).

Consider the general vector parameter case θ, the estimator is required to be a linear function

of the data, i.e.,

The BLUE is derived by finding the A which minimizes the variance, subject

to the constraint AH = I, where C is the covariance matrix of the data x. Carrying out the

minimization yields the following form for the BLUE:

where .

Salient attributes of BLUE:

For the general linear model, the BLUE is identical in form to the MVUE.

The BLUE only assumes only up to 2nd-order statistics and not the complete PDF of the data unlike

the MVUE which was derived assuming Gaussian PDF.

If the data is truly Gaussian then the BLUE is also the MVUE.

The BLUE for the general linear model can be stated in terms of following theorem.

Gauss-Markov Theorem: Consider a general data model of the form:

x = Hθ + w

where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).

Then the BLUE of θ is:

3.4.2 Example

Consider a signal embedded in noise:

where w[n] is of unspecified PDF with var(w[n]) = ζn2 and the unknown parameter θ = A is to

be estimated. We assume a BLUE estimate and derive H by noting:

E[x] = 1θ

where x = [x[0],x[1],x[2],…,x[N - 1]] , 1 = [1, 1, 1,…, 1]T and we have H ≡ 1. Also:

T

and we note that in the case of white noise where ζn2 = ζ2 then we get the sample-mean

estimator:

3.4.3 Example

Consider the acoustic echo cancellation (AEC) problem and the signal flow-graph of the

same is shown in Figure 3.1. A speech signal u[n] from the far-end side is broadcast in a

room by means of a loudspeaker. A microphone is present in room to record the local

signal v[n] which is to be transmitted back to the far-end side. The recorded microphone

signal y[n] = v[n] + x[n] contains the undesired echo x[n] due to acoustic echo path existing

between loudspeaker and microphone. The echo path transfer function is modeled using an

FIR filter , so the echo signal can be considered as a filtered

version of the loudspeaker signal . The object of the AEC is to

estimate the impulse response Ĥ(z) of the echo path so as to produce a echo-free

signal .

Figure 3.1: Acoustic echo cancellation problem

The linear estimation problem for the derivation of echo path impulse response can be written

as

here v the near-end signal vector is modeled as zero mean process with correlation matrix

R = EvvT. Any linear estimator of h can be written as a linear function of the microphone

signal vector y as

3.4.4 Example

Show that the BLUE commutes over linear (affine) transformation.

Let us consider that given BLUE of θ we wish to estimate

Then the BLUE of is given by

where H is known, and w is noise with covariance C (the PDF of w otherwise arbitrary).

Then the BLUE of θ is:

60 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

where is the minimum covariance matrix. Further assuming that

transformation matrix B is invertible, we get

Then,

Therefore,

This shows that in case of linear transformation of the parameter the BLUE of the

transformed parameter could be obtained by simply by applying the same linear transform to

the BLUE of the original parameter.

Outline

4.1 Outline

In this module, the principle of maximum likelihood estimation is discussed. This happens to

be the most popular approach for obtaining practical estimators. The maximum likelihood

estimation approach is very desirable in situations where the MVUE does not exist or cannot

be found even though it does exist. The attractive feature of the maximum likelihood

estimator (MLE) is that it can always be found following a definitive procedure, allowing it

to be used for complex estimation problems. Additionally, the MLE is asymptotically optimal

for large data record. The salient topics discussed this module are:

Basic Procedure of MLE

MLE for Transformed Parameters

MLE for General Linear Model

Asymptotic Property of MLE

4.2.1 Basic Procedure of MLE

In some case the MVUE may not exist or it cannot be found by any of the methods discussed

so far. The maximum likelihood estimation (MLE) approach is an alternative method in cases

where the PDF or the PMF is known. This PDF or PMF involves the unknown parameter θ

and is called the likelihood function. With MLE the unknown parameter is estimated by

maximizing the likelihood function for the observed data. The MLE is defined as:

It can be shown that is asymptotically unbiased:

An important result is that if an MVUE exists, then the MLE procedure will produce it.

Proof

Assume a scaler parameter case, if an MVUE exists, then the log-likelihood function can be

factorized as

where .

On maximizing the likelihood function, by setting its derivative to zero, yields the MLE

Another important observation is that unlike the previous estimates the MLE does not

require an explicit expression for p(x; θ)! Indeed given a histogram plot of the PDF as a

function of θ one can numerically search for the θ that maximizes the PDF.

4.2.2 Example

Consider the problem of a DC signal embedded in noise:

where w[n] is WGN with zero mean and known variance ζ2.

We know that the MVU estimator for θ is the sample-mean. To see that this is also the MLE,

we consider the PDF:

4.2.3 Example

Consider the problem of a DC signal embedded in noise:

where w[n] is WGN with zero mean but unknown variance which is also A, that is the

unknown parameter, θ = A, manifests itself both as the unknown signal and the variance of

the noise. Although a highly unlikely scenario, this simple example demonstrates the power

of the MLE approach since finding the MVUE by the procedures is not easy. Consider the

likelihood function for x is given by:

maximize it with respect to θ. For Gaussian PDFs it is easier to find the maximum of the log-

likelihood function (since logarithm is a monotonic function):

On differentiating we have:

on setting the derivative to zero and solving for θ, produces the MLE:

where we have assumed θ > 0. It can be shown that:

and:

4.3.1 MLE for Transformed Parameters

The MLE of the transformed parameter, α = g(θ) is given by:

where is the MLE of θ. If g is not one-to-one function (i.e., not invertible) then is

obtained as the MLE of transformed likelihood function, pT (x; α), which is defined as:

4.3.2 Example

In this example we demonstrate the ﬁnding of transformed MLE. In context of the previous

example, consider two different parameter transformations (i) α = exp(A) and (ii) α = A².

Case (i) From previous example, the PDF parameterized by the parameter θ = A can be given

as

the transformed parametercan be given as

Now to find the MLE of α, setting the derivative of pT (x; α) with respect to α to zero yields

or

But being the MLE of A, so we have = exp(Â). Thus the MLE of the transformed

parameter is found by substituting the MLE of the original parameter into the transformation

function. This is known as invariance property of MLE.

Case (ii) Since , the α is not one-to-one transformation of A. If we

take only then some possible PDFs will be missing. To characterize all possible

PDFs, we need to consider two sets of PDFs

The MLE of α is the value of α that yields the maximum of pT1(x; α) and pT2(x; α) or

1. For a given value of α, say α0, determine whether pT1(x; α) or pT2(x; α) is larger. If for

example pT1(x; α0) > pT2(x; α0) then denote the value of pT1(x; α0) as . Repeat for

all α > 0 to form . Note that .

2. The MLE is given as the α that maximizes over α ≥ 0.

Thus the MLE is

4.3.3 MLE for General Linear Model

Consider the general linear model of the form:

x = Hθ + w

where H is a known N × p matrix, x is an N × 1 observation vector with N samples,

and w is N × 1 noise vector with PDF (0,C). The PDF of the observed data is:

and the MLE of θ is found by differentiating the log-likelihood which can be shown to yield:

Lecture 12 : Properties of MLE

4.4.1 Asymptotic Normality Property of MLE

The asymptotic property of the MLE can be stated as follows. If the PDF p(x; θ) of the

data xsatisfies some “regularity” conditions, then the MLE of the unknown parameter θ is

asymptotically distributed (i.e., for large data record) according to

Proof

Proof

In the following, the proof of this important property is outlined for the scalar parameter case.

Assuming that the observations are IID and the regularity condition holds

i.e., . Further assume that the first-order and the second-order derivatives

of the likelihood function are defined.

Before deriving the asymptotic PDF, first its is shown that MLE is a consistent estimator. For

this using the Kullback-Leibler information inequality

or

with equality if and only if θ1 = θ2 or the right-hand side of the above inequality is maximized

for θ = θ0. As the data is IID, the maximization of log-likelihood function is equivalently

maximizing

But for N → ∞, this converges to the expected value by the law of large numbers. Hence, if

θ0 be the true value of θ, we have

By a continuity argument, the normalized log-likelihood function also maximizes for θ = θ0or

as N → ∞, the MLE is = θ0. Thus the MLE is consistent.

To derive the asymptotic PDF of the MLE using the mean value theorem, we have

where . But by the definition of the MLE the left-hand side of the above relation

is zero, so that

65 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

where the last convergence is due to law of large numbers and i(θ0) denotes the information

for single sample. Also the numerator term is

Now let is a random variable, being the function of x[n]. Additionally, since

the x[n]s are IID so are the ξns. By the central limit theorem the numerator term has the PDF

that converges to a Gaussian with mean

due to independence of the random variables. On applying the Slutsky‟s theorem which says

that if a sequence of random variable xn has the asymptotic PDF of the random variable x and

the sequence of random variables yn converges to a constant c, then xn / yn has the same

asymptotic PDF as the random variable x∕c. Thus in this case

So that

or equivalently

or finally

Thus the distribution of MLE for a parameter is asymptotically normal with mean as true

value of the paramter and the variance as the inverse of the Fisher information.

Outline

5.1Outline

In this module, a class of estimator is introduced which unlike the earlier discussed optimal

(MVU) or asymptotically optimal (MLE) estimator has no optimality property in general. In

this approach no probabilistic assumptions about the data are made, only a signal model is

assumed. The least squares estimator (LSE) is determined by the minimization of least

squares error and is widely used in practice, due to ease of implementation. The salient topics

discussed this module are:

Basic Procedure of LSE

Linear Least Squares

Geometrical Interpretations of LS Approach

Constrained Least Squares

Lecture 13 : Least Squares Estimation

5.2.1 Basic Procedure of LSE

The MVUE, BLUE, and MLE developed previously required an expression for the PDF

p(x; θ) in order to estimate the unknown parameter θ in some optimal manner. An alternative

approach is to assume a signal model (rather than making probabilistic assumptions about the

data) and achieve a design goal assuming this model. With the least squares (LS), approach

we assume that the signal model is a function of the unknown parameter θ and produces a

signal:

Due to measurement noise and model inaccuracies w[n], only the noisy version x[n] of the

true signal s[n] can be observed as shown in Fig. 5.1 .

Unlike previous approaches no assumption is made about the probabilistic distribution of

w[n]. We only state that what we have observed is an “error” e[n] = x[n] - s[n] which with the

appropriate choice of θ should be minimized in a least-squares sense. Thus we

choose θ = so that the cost function:

is minimized over N observation samples of interest and we call this the LSE of θ. More

precisely we have:

An important assumption to produce a meaningful unbiased estimate is that the noise and

model inaccuracies, w[n], have zero mean. However no other probabilistic assumption about

the data is made (i.e., LSE is valid for both Gaussian and non-Gaussian noise). At the same

time we can not make any optimality claims with LSE (as this would depend on the

distribution of the noise and modeling errors).

A problem that arises from assuming the signal model function s(n; θ) rather than knowledge

of p(x; θ) is the need to choose an appropriate signal model. Then again in order to obtain a

closed form or parametric expression for p(x; θ) one usually requires to know what the

underlying model and noise characteristics are anyway.

5.2.2 Example

Consider observations, x[n], arising from a DC-level signal model, s[n] = s(n; θ) = θ:

67 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

and hence which is the sample-mean. We also have:

5.3.1 Linear Least Squares

Although there is no restrictions on the form of the assumed signal model in LSE, but often it

is assumed that the signal model is a linear function of the parameter to be estimated:

s = Hθ

where s = [s[0],s[1],s[2],…,s[N - 1]]T and H is a known N × p matrix with θ = [θ1,θ2,…,θp]T.

Now

x = Hθ + w

and with x = [x[0],x[1],x[2],…,x[N - 1]] we have:

T

which surprisingly is identical in functional form to the MVU estimator for the linear model.

An interesting extension to the linear LS is the weighted LS where the contribution to the

error from each component of the parameter vector can be weighted in importance by using a

different form of the error criterion:

matrix Wis generally a diagonal and its main purpose is to emphasize the contribution of

those data samples that are deemed to be more reliable.

5.3.2 Example

Reconsider the acoustic echo cancellation problem discussed earlier having the signal flow

diagram re-shown in Figure 5.2 for the ease of reference. We now show how linear LS

appraoch can be used for estimating the required echo path impulse response in this problem.

Figure 5.2: Acoustic echo cancellation problem

Recall the output vector at the microphone is expressed as

y = Uh + v

Any linear estimator of h can be written as a linear function of the microphone signal

vector y as

Recall that for this problem, the earlier derived BLUE is of the form:

Note that the LSE does not take into account the near-end signal characteristics (R = EvvT)

and therefore, in practice, it is not found as effective as the BLUE.

5.4.1 Geometrical Interpretations of LS Approach

The geometrical perspective of the LS approach helps provide the insights into the estimator

and also reveals other useful properties. Consider a general linear signal model s = Hθ. On

denoting the ith column of H by hi, the signal model can be seen as a linear combination of the

“signal” vectors as

Figure 5.3: Geometrical visualization of linear least squares in 3-dimensional (R³) space

We now note that the linear LS approach attempts to minimize the square of the distance

from the data vector x to a signal vector , which must be a linear combination of

the columns of H. The data vector can lie anywhere in an N-dimensional space, termed RN,

while all possible signal vectors, being linear combinations of p < N vectors, must lie in a p-

dimensional subspace of RN, termed Sp. The full rank of H assumption ensures that the

columns are linearly independent and hence the subspace is truly p-dimensional. For N = 3

and p = 2 this is illustrated in Figure 5.3. Note all possible choices of θ1, θ2 (where assume

that -∞ < θ1 < ∞ and -∞ < θ2 < ∞) produce signal vectors constrained to lie in subspace S² and

that in general xdoes not lie in the subspace. It is intuitively obvious that the vector that lies

in S² and that is closest to x in the Euclidean sense is the component of x in S². In other

words, is the orthogonal projection of x onto S². This makes the error vector x - to be

orthogonal (or perpendicular) to all vectors in S². Two vectors in RN are defined to be

orthogonal if xTy = 0. For the considered example, we can determine the appropriate by

using the orthogonality condition as

On combining the two equations and using the matrix form, we have

or

The error vector must be orthogonal to the columns of H. This is the well

known orthogonality principle. In effect the error represent the part of x that cannot be

described by the signal model. The minimum LS error J min can be given as

Figure 5.4: Effect of nonorthogonality of the columns of the observation matrix H

As illustrated in Figure 5.4 (a), if the signal vectors h1 and h2 were orthogonal, then could

have easily been found. This is because the components of along h1 or 1 do not contain a

component of along h2. If this does not happen then we have the situation as in

Figure 5.4 (b). Making the orthogonality assumption and also assuming that ||h1|| = ||h2|| = 1

(orthonormal vectors), we have

where hiTx is the length of the vector x along hi. In matrix notation this is

so that

This result is due to orthonormal columns of H. As a result, we have HTH = I and therefore

In general, the columns of H will not be orthogonal, so that the signal vector estimate is

obtained as

The N × N matrix P = H(HTH)-1HT is known as projection matrix. It has the properties that it

is symmetric (PT = P), idempotent (P2 = P) and it must be singular (for independent columns

of H, it has rank p).

Lecture 16 : Constrained LSE

5.5.1 Constrained Least Squares

In some LS estimation problems, the unknown parameters are constrained. Consider that we

wish to estimate the amplitudes of a number of signals but it is known a priori that some of

the signals are of same amplitude. In this case, the number of parameters can be reduced to

take advantage of the prior knowledge. If the parameters are linearly related, it leads to least

squares problem with linear constraints and could be easily solved.

Consider a least squares estimation problem with parameter θ subjected to r < p linear

constraints. Assuming the constraints be independent, we can summarize the constraints as

Aθ + b

where A is a known r × p matrix and b is a known r × 1 vector. To find the LSE subject to

constraints we setup a Lagrangian and determine the constrained LSE by minimizing the

Lagrangian

respect to θ and setting it to zero produces,

constraint so that that

and hence

Remark: Note that the constrained LSE is a corrected version of the unconstrained LSE. In

cases where the constraint by chance satisfied by or A = b, then according to above

relation the LSE and constrained LSE be identical. Such is usually not the case however.

5.5.2 Example

In this example we explain the effect of constraints on LSE. Consider a signal model

The signal vector can be expressed as

Now assume that it is known a priori that θ1 = θ2. Expressing this constraint in matrix form

we have [1 - 1]θ = 0. So that A = [1 - 1] and b = 0. Noting that HTH = I, we can get the

constrained LSE as

With some matrix algebra we can show the constrained LSE and corresponding signal

estimate as

Since θ1 = θ2, the two observations are averaged which is intuitively reasonable. In this simple

problem, we can easily incorporate the constraint into the given signal model

Note the parameter to be estimated gets reduced to θ only. On estimating the unconstrained

LSE of θ using the reduced signal model would have produced the same result. Similar to the

least squares, it would be interesting to view the constrained least squares estimation problem

geometrically as done in Figure 5.5 in the context of this example.

Outline

6.1 Outline

In this module, a new class of estimators are introduced. This class departs from the classical

approach to statistical estimation in which the unknown parameter of interest is assumed to

be a deterministic but unknown constant. Instead it is assumed that the unknown parameter is

a random variable whose particular realization is to be estimated. This approach makes direct

use of the Bayes‟s theorem and so is commonly termed as Bayesian approach. The two main

advantages of this approach are the incorporation of prior knowledge about the parameter to

be estimated and providing an alternative to the MVUE when it cannot be found. First, the

minimum mean square estimator (MMSE) is discussed. It is followed by introduction to

Bayesian linear model which allows the finding of the MMSE with ease. The salient topics

discussed this module are:

Minimum mean square estimator

Bayesian linear model

General Bayesian estimators

Lecture 17 : Bayesian Estimation

6.2.1 Minimum Mean Square Estimator (MMSE)

The classical approach we have been using so far has assumed that the parameter θ is

unknown but deterministic. This the optimal estimator is optimal irrespective and

independent of the actual value of θ. But in cases where the actual value or prior knowledge

of θ could be a factor (e.g. where the MVU estimator does not exist for certain values or

where prior knowledge would improve the estimator performance), the classical approach

would not work effectively.

In the Bayesian approach the θ is treated as a random variable with a known prior pdf, p(θ).

Such prior knowledge concerning the distribution of the estimator should provide better

estimators than the deterministic case.

In the classical approach the MVU estimator is derived by first considering minimization of

the mean square error, i.e., = arg minθ mse(θ) where:

and p(x; θ) is the PDF of x parameterized by θ. In the Bayesian approach, the estimator is

similarly derived by minimizing = arg minθ Bmse(θ) where:

is the Bayesian mean square error and p(x,θ) is the joint PDF of x and θ (since θ is now a

random variable). It is to note that the squared error ( - θ)² is identical in both Bayesian and

classical MSE. The minimum Bmse( ) estimator or MMSE estimator is derived by

differentiating the expression of Bmse( ) with respect to and setting this to zero to yield:

Thus MMSE is the conditional expectation of the parameter θ given the observations x. Apart

from the computational (and analytical!) requirements in deriving an expression for the

posterior PDF and then evaluating the expectation E(θ|x) there is also the problem of finding

an appropriate prior PDF. The usual choice is to assume that the joint PDF, p(x,θ), is

Gaussian and hence both the prior PDF, p(θ) and posterior PDF, p(θ|x), are also Gaussian

(this property implies the Gaussian PDF is a conjugate prior distribution). Thus the form of

the PDFs remains the same and all that changes are the means and the variances.

6.2.2 Example

Consider a signal embedded in noise:

where as before w[n] ~ (0,ζ²) is a WGN process and the unknown parameter θ = A is to be

estimated. However in the Bayesian approach we also assume the parameter A is a random

variable with prior PDF which in this case is Gaussian PDF, p(A) = (μA,ζA²). We also

have p(x|A) = (A,ζ²) and we can assume that A and x are jointly Gaussian. Thus the

posterior PDF:

and hence the MMSE is:

where .

Upon closer examination of MMSE we observe the following (assume ζA² ≪ ζ²):

1. With fewer data (N is small) we have and Â ⇒ μA, that is the MMSE tends towards

the mean of the prior PDF and effectively ignores the contribution of the data. Also p(A|x) ≈

(μA,ζA²).

2. With large amounts of data (N is large) we have and Â ⇒ , that is the MMSE

tends towards the sample-mean and effectively ignores the contribution of the prior

information. Also p(A|x) ≈ .

Conditional PDF of Multivariate Gaussian: If x and y are jointly Gaussian where x is k × 1

and y is l × 1, with the mean vector [E(x)T, E(y)T]T and the partitioned covariance matrix,

then the conditional PDF, p(y|x), is also Gaussian and the posterior means vector and the

covariance matrix are given by:

This result can be used for the MMSE estimation involving jointly Gaussian parameter vector

and the data vector.

Lecture 18 : Properties of Bayesian Estimator

6.3.1 Bayesian Linear Model

Now consider the Bayesian linear model:

x = Hθ + w

where θ is the unknown parameter to be estimated with prior PDF (μθ, Cθ) and w is a

WGN with PDF (0, Cw). The MMSE is provided by the expression for E(y|x) where we

identify

y ≡ θ. We have:

Suppose that both θ and α were unknown parameters but we are only interested in θ.

Then α is a nuisance parameter. We can deal with this by “integrating α out of the way”.

Consider the Bayes‟s rule for the posterior PDF:

Now p(x|θ) is, in reality, p(x|θ,α), but we can obtain the true p(x|θ) by:

and if α and θ are independent then:

In the classical estimation we do not make any assumption on the prior, thus all

possible θ have to be considered. The equivalent prior would be a flat distribution and

essentially σθ² = ∞. This so-called non-informative prior PDF will yield the classical estimator

where such is defined.

6.3.4 Example

Consider again the signal embedded in noise problem:

the classical estimator.

Lecture 19 : General Bayesian Estimator

6.4.1 General Bayesian Estimators

The Bmse( ) is given by:

is one specific case for a general estimator that attempts to minimize the average of the cost

function, (ϵ), that is the Bayesian risk = E[ (ϵ)] where ϵ = (θ - ). Figure 6.1 shows the

plots of three different cost functions of wide interest and the discussion about which one of

the central tendencies of the posterior PDF gets emphasised with choice of these cost

functions is given below:

1. Quadratic: (ϵ) = ϵ² which yields = Bmse( ). It is already shown that the estimate to

minimize = Bmse( ) is:

2. Absolute: (ϵ) = |ϵ| The estimate, , that minimizes = E[|θ - |] satisfies:

penality is assigned here to all errors greater than δ.

The estimate that minimizes the Bayes risk can be shown to be:

which is the mode of the posterior PDF, i.e., the value that maximizes the PDF.

Figure 6.1: Common cost functions used in finding the Bayesian etimator

For the Gaussian posterior PDF, it should be noted that the mean, the median and the mode

are identical. Of most interest are the quadratic and hit-or-miss cost functions which, together

with a special case of the later, yield the following three important classes of estimators:

1. MMSE Estimator: The minimum mean square error (MMSE) estimator which has already

been introduced as the mean of the posterior PDF.

2. MAP Estimator: The maximum a posteriori (MAP) estimator which is the mode (or

maximum) of the posterior PDF.

3. Bayesian ML Estimator: The Bayesian maximum likelihood estimator which is the special

case of the MAP estimator where the prior PDF, p(θ), is uniform or non-informative:

Noting that the conditional PDF of x given θ, p(x|θ), is essentially equivalent to the PDF

of x parameterized by θ, p(x|; θ), the Bayesian ML estimator is equivalent to the classical

MLE.

Comparison among the three types of Bayesian estimators:

The MMSE is preferred due to its least-squared cost function but it is also the most difficult

to derive and compute due to the need to find an expression of the posterior PDF, p(θ|x) in

order to integrate ∫ p(θ|x)dθ.

The hit-or-miss cost function used in the MAP estimator though less “precise” but it is much

easier to derive since there is no need to integrate, only find the maximum of the posterior

PDF p(θ|x) which can be done either analytically or numerically.

The Bayesian ML is equivalent in preference to the MAP only in the case where the prior is

non-informative, otherwise it is a sub-optimal estimator.

Like the classical MLE, the expression for the conditional PDF, p(x|θ) is easier to obtain

rather than that of the posterior PDF, p(θ|x). Since in most cases knowledge of the prior is not

available so, not surprisingly, classical MLE tend to be more prevalent. However it may not

always be prudent to assume that prior is uniform, especially in the cases where prior

knowledge of the estimate is available even though the exact PDF is unknown. In these cases

a MAP estimate may perform better even if an “artificial” prior PDF is assumed (e.g., a

Gaussian prior which has the added benefit of yielding a Gaussian posterior).

Module 7 : Estimation of Signals

Outline

7.1 Outline

The Bayesian estimators discussed in previous module are difficult to implement in practice

as they involve multi-dimensional integration for the MMSE estimator and multi-dimensional

maximization for the MAP estimator. In general these estimators are very difficult to derive

in closed form except under the jointly Gaussian assumption. When ever the Gaussian

assumption is not valid, an alternate approach is required. In this module we introduce the

Bayesian estimators derived with linearity constraint which depends on only first two

moments of the PDF. This approach is analogous to the BLUE in classical estimation case.

These estimators are also termed as Wiener filter and find extensive use in practice. The

salient topics discussed this module are:

Linear Minimum Mean Square Error (LMMSE) Estimator

Bayesian Gauss-Markov Theorem

Wiener Filtering and Prediction

7.2.1 Linear Minimum Mean Square Error (LMMSE) Estimator

Assume that the parameter θ is to be estimated based on the data set x = [x[0],x[1],…,x[N -

1]]T. Rather than assuming any specific form for the joint PDF p(x,θ), we consider the class

of all affine estimators of the form:

where a = [a1,a2,…,aN-1]T.

The estimation problem now is to choose the weight coefficients [a,aN] to minimize the

Bayesian MSE:

The resultant estimator is termed as linear minimum mean square error (LMMSE) estimator.

It is to note that the LMMSE estimator will be sub-optimal unless the MMSE estimator

happens to be linear. The MMSE estimator is linear when θ and x are jointly Gaussian.

If the Bayesian linear model is applicable, we can write

x = Hθ + w

The weight coefficients are obtained from = 0 for i = 0, 1,…,N - 1 this yields:

Thus the LMMSE estimator is:

For the 1 × N vector parameter θ an equivalent expression for LMMSE estimator is derived

as:

where Cθθ = E[(θ - E(θ))(θ - E(θ))T] is p × p covariance matrix.

7.2.2 Example

Let us revisit the acoustic echo cancellation problem, the purpose is to show that variance of

the estimator of the echo path impulse response could be lowered further by dropping the

unbiasedness constraint. The signal flow diagram of the problem is shown again in Figure 7.1

for the ease of reference.

Recall the output vector at the microphone is expressed as

y = Uh + v

Let us assume that the echo path (room) impulse response is a random variable ho with some

prior knowledge is available i.e., ho has the prior PDF p(ho) with first and second moments

given as:

where ho denotes the true value of the impulse response, is then given by the mean of the

posterior PDF p(ho|y)

It is worth comparing the form of LMMSE with that of earlier derived BLUE:

Note that in MMSE criterion of the Bayesian framework, the variance of the estimator and

squared bias are weighted equally. Thus the variance of the estimator could be reduced

further than that of the MVUE if the estimator is no longer constrained to be unbiased.

Lecture 21 : Wiener Smoother

7.3.1 Bayesian Gauss-Markov Theorem

If the data are described by the Bayesian linear model:

x = Hθ + w

where x is N × 1 data vector, H is known N × p observation matrix, θ is p × 1 random vector

of parameters with mean E(θ) and covariance matrix Cθθ and w is an N × 1 noise vector with

zero mean and covariance matrix Cw which is uncorrelated with θ (the joint PDF p(w,θ) and

hence also p(x,θ) are otherwise arbitrary). Noting that:

We assume N sample of time-series data x = [x[0], x[1],…, x[N - 1]]T which are wide-sense

stationary (WSS). Further as E(x) = 0, such N × N covariance matrix takes the symmetric

Toeplitz form:

where rxx[k] = E(x[n]x[n-k]) is the autocorrelation function (ACF) of the x[n] process

and Rxxdenotes the autocorrelation matrix. Note that since x[n] is WSS the

expectation E(x[n]x[n-k]) is independent of the absolute time index n.

In signal processing the estimated ACF is used. The estimated ACF is given by,

Both the data x and the parameter to be estimated are assumed to be zero mean. Thus the

LMMSE estimator is:

Application of the LMMSE estimation to the three signal processing problems such as

soomthing, filtering and prediction gives rise to different kinds of Wiener filters and are

discussed in the following sections.

The problem is to estimate the signal θ = s = [s[0], s[1],…, s[N - 1]]T based on the noisy data

x = [x[0], x[1],…, x[N - 1]]T where

x=s+w

and w = [w[0], w[1],…, w[N - 1]] is the noise process. An important difference between

T

smoothing and filtering is that the signal estimate ŝ[n] can use the entire data set:

the past values (x[0], x[1],…, x[n- 1]), the present value x[n] and the future values

(x[n + 1], x[n + 2],…, x[N - 1]). This means that the solution cannot be cast as “filtering”

problem since we cannot apply a causal filter to the data. We assume that the signal and noise

processes are uncorrelated. Hence,

and thus

Also

80 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

and the N × N matrix

Non-causal Wiener filtering

Consider a smooting problem in which a signal s[n] is required to be estimated given a noisy

signal x[n] of infinite length, i.e., {…, x[-2], x[-1], x[0], x[1], x[2],…} or x[k] for all k. In such

cases, the smoothing estimator takes the form

Now, by letting h[k] = αn-k, the above estimator can be expressed as convolution sum

where h[k] could be indentifies as the impulse response of an infinite length two-sided time-

invariant filter.

Analogous to the LSE case (refer to 5.4.1) the orthogonality principle also holds for the

LMMSE case, i.e., the error in estimation (θ - ) is always orthogonal (or perpendicular) to

the observed data {…, x[-1], x[0], x[1],…}. This could be mathematically expressed as

Hence,

Thus the equations required to be solved for determining the infinite Wiener filter impulse

response, also referred to as Wiener-Hopf equations, are given by

On taking the Fourier transform of both sides of the above equation we have

where H(ƒ) is the frequency response of the infinite Wiener smoother, Pxx(ƒ) and Pss(ƒ) are the

power sepctral density of noisy and clean signals, respectively.

As the signal and noise are assumed to be uncorrelated, the frequency response of Wiener

smoother can be expressed as

Remarks: Since the power spectral densities are real and even function of frequency, the

impulse response also turns out to be real and even. This means that the designed filter is

non-causal which is consistent with the fact that the signal is estimated using both future as

well as present and past data.

7.3.3 Example

Consider a noisy signal x[n] consisting of a desired clean signal s[n] corrupted with an

additive white noise w[n]. Given that the autocorrelation function (ACF) of signal and noise

are

respectively. Assume that signal and noise are uncorrelated and zero mean. Find the non-

causal optimal Wiener filter for estimating the clean signal from its noisy version.

On computing the z-transform of ACF, the power spectral of the signal and the noise can be

derived as

Given Hopt(z), the impulse response of the optimal stable filter turns out as

7.4.1 Wiener Filtering

The problem is to estimate the signal θ = s[n] based only on the present and past noisy data

x = [x[0],x[1],…,x[N - 1]]T. As n increases this allows us to view the estimation process as an

application of a causal filter to the data and we need to cast the LMMSE estimator expression

in form of a filter.

Assuming the signal and noise processes are uncorrelated, we have,

Note that Cθx is a 1 × (n + 1) row vector.

that the “check” subscript is used to denote time reversal. Thus we have

The process of forming the estimator as time evolves can be interpreted as a filtering

operation. Specifically we let h(n)[k], the time-varying impulse response, be the response of

the filter at time n to an impulse applied k samples before (i.e., at time n-k). We note

that di can be interpreted ad the response of the filter at time n to the signal (or impulse)

applied at time

i = n - k. Thus we can make the following correspondence,

Then:

reversed version of a. To explicitly find the impulse response h we note that since,

Remark: A computationally efficient solution for solving the equations exists and is known

as Levinson recursion which solves the equations recursively to avoid resolving them for

each value of n.

Causal Wiener Filtering

On using the property rxx[k] = rxx[-k], the Wiener-Hopf equation can be written as

For large data (n → ∞) case, the time-varying impulse response h(n)[k] could be replaced with

its tine-invariant version h[k] and we have

This is termed as the infinite Wiener filter. The determination of the causal Wiener filter

involves the use of the spectral factorization theorem and is explained in the following.

The one-sided z-transform of a sequence x[n] is defined as

where the filter impulse response h[n] is constrained to be causal. The two-sided z-transform

that satisfies the Wiener-Hopf equation could be written as

where (z) is the z-transform of a causal sequence and (z-1) is the z-transform of

a anticausal sequence. Thus we have

7.4.3 Example

Consider a signal s[n] corrupted with an additive white noise w[n]. The signal and noise are

assumed to be zero mean and uncorrelated. The autocorrelation function (ACF) of the signal

and noise are

Find the causal optimal Wiener filter to estimate the signal from its noisy observations.

As the signal and noise are uncorrelated, so we have

Note corresponds to a right-handed sequence while corresponds to a left-

handed sequence. Figure 7.2 shows the plot of the sequences corresponding to these

components as well as that of the optimal non-causal filter obtained by combination.

Form these plots, it is straight forward to determine the causal sequence corresponding to the

optimal filter and that is also shown in Figure 7.2 . Thus

Figure 7.2: Plot showing the two-sided and the causal optimal filter for causal Wiener filter

example problem.

Lecture 23 : Wiener Predictor

7.5.1 Wiener Prediction

The problem is to estimate a future sample θ = x[N - 1 + l], as l ≥ 1, based on the current and

past data x = [x[0],x[1],…,x[N - 1]]T. The resulting estimator is termed as l-step linear

predictor.

As before we have Cxx = Rxx where Rxx is N × N autocorrelation matrix and:

filtering operation where

Therefore,

explicit expression for h by noting that:

where rxx = [rxx[l]rxx[l + 1]…rxx[l + N - 1]]T is the time-reversed version of When written

out we get the Wiener-Hopf prediction equations:

As pointed out earlier the Levinson recursion is the computationally efficient procedure for

solving these equations recursively. The special case for l = 1, the one-step predictor, covers

two important cases in signal processing:

The values -h[n] are termed as the linear prediction coefficients (LPC) which are used extensively in

speech coding. For example, commonly a 10th-order (N = 10) linear predictor used in speech coding

and is given by

The resulting Wiener-Hopf equations equations are identical to the Yule-Walker equationsused to

solve the autoregressive (AR) filter parameters of an AR(N) process.

7.5.2 Example

Consider a real wide-sense stationary (WSS) random process x with autocorrelation sequence

The first few coefficients of the autocorrelation sequence rxx(k) are:

where ais are the predictor coefficients. These predictor coefficients are obtained by solving

the Wiener-Hopf prediction equations for N = 2

or

On solving we have a1 = -5∕6 and a2 = 1∕6. Thus, the optimal predictor polynomial is given by

or

Module 12 : Review Questions

Assignments

12.1 Long Answer Questions

1. What do you mean by an efficient parameter estimator?

2. Discuss the asymptotic efficiency of an parameter estimator under nonlinear transformation

of the parameter.

3. Explain the purpose of whitening transformation in context of the general linear model for

parameter estimation.

4. Define sufficient statistic. What do you mean by minimal sufficient statistic?

5. What do you mean by the invariance property of the maximum likelihood estimator?

Establish the property.

6. Explain the role of projection matrix in context of linear least squares estimation.

7. Comment on the need of finding a conjugate prior in context of Bayesian estimation.

8. What are the nuisance parameters and how do we deal them in context of Bayesian

estimation?

9. Discuss the relationship between Bayesian and classical estimation approaches.

10. What are the commonly used cost functions in Bayesian estimation? Also comment about the

class of estimators resulting from them.

11. Explain why Wiener filters are termed as linear Bayesian estimators. When are these filters

optimal?

12. What do you mean by the receiver operating characteristics (ROC)? Also explain the effect of

increasing deflection coefficient in ROC.

13. Define the Bayes risk ( ) for a multiple hypothesis testing problem and comment under

which condition it reduces to probability of error (Pe).

14. What do you mean by the composite hypothesis testing problem? Contrast between Bayesian

and generalized likelihood ratio test (GLRT) based approaches for the same.

15. What is the fundamental difference between classical NP-detector and sequential detector?

16. Show how the matched filter maximizes the signal-to-noise ratio for the detection of known

signal.

17. What do you mean by the generalized matched filter? In which cases are these filters

employed?

18. The estimator-correlator detector is used for the detection of random signal in white Gaussian

noise. Explain why can‟t we use matched filter for the same purpose.

19. Explain the need of signal design in case of the signal detection in coloured noise.

20. What do you mean by the non-parametric detection? Discuss the sign detector.

12.2 Multiple Choice Questions

1. For DC level A in WGN, x[n] = A + w[n], n = 0, 1,…,N - 1, the sample mean

is the efficient estimator for A. For estimating A², the will be

A. unbiased and efficient estimator

B. unbiased but not efficient estimator

C. unbiased and minimum variance estimator

D. biased estimator

2. In classical estimation theory, it is not always possible to find the CRLB bound because

A. the MVU estimator does not exists.

B. the sufficient statistics exist but are not complete.

C. the Bayesian estimator may result in better performance.

D. the 1st-order derivative of likelihood function is not defined everywhere.

3. For N observations, the Fisher information is I(θ) = N i (θ) where i (θ) being the Fisher

information of each of the observations. This is true when the observations are

A. independent and identically distributed

B. completely dependent and identically distributed

C. both A and B

D. neither A nor B

4. In vector parameter estimation case, the Fisher information matrix turning out to be diagonal

implies that

A. all unknown parameters are correlated.

B. none of the unknown parameters is correlated with the other.

C. some of the unknown parameters are correlated while others are uncorrelated.

D. nothing can be said about the correlation among unknown parameters.

5. For the sufficient statistic T(x) to yield in an MVUE which one of the following conditions

must be true

A. T(x) must be complete.

B. T(x) must be unbiased.

C. Both A and B.

D. Either A or B.

6. According Neyman-Fisher factorization theorem, the sufficient statistic T(x) for a estimation

problem can found if the PDF of the data p(x; θ) can be factorized as

A. p(x; θ) = g(h(x),θ) T(x)

B. p(x; θ) = g(T(x),θ) h(x)

C. p(x; θ) = g(T(x),h(x)) θ

D. p(x; θ) = g(θ) T(x) h(x)

7. In OOK communication system, the observed data is given as x[n] = A cos(2πƒ1n)+w[n], n =

0, 1,…,N -1 where w[n] is white noise with zero mean and variance ζ². The best linear

unbiased estimator (BLUE) for A is given by

A.

B.

C.

D.

8. The maximum likelihood procedure yields an estimator that is asymptotically efficient, but

A. sometimes it also yields an efficient estimator for finite data records

B. it never yields an efficient estimator for finite data records

C. it yields MVU estimator for finite data records and not the efficient estimator

D. none of the above

9. Given the observations {X1,X2,…,XN} having Poisson distribution

[P(X = x) = ,(x = 0, 1,…)] with unknown parameter λ > 0. The maximum likelihood

estimate of λ be

A.

B. 1∕

C. ²

D. 1∕ ²

10. For an estimation problem, if an efficient estimator exists, then maximum likelihood

estimator

A. will always produce it.

B. will produce it in some cases only.

C. will never produce it.

D. none of the above

11. For the least squares estimation which one of the following assumptions is true.

A. The data is assumed to have uniform distribution.

B. The data is assumed to have Gaussian distribution.

C. The data is assumed to be probabilistic with first two moments known.

D. No probabilistic assumption about the data is made.

12. Suppose that three measurements of signal s(k) = θ exp(k ∕2), where θ is the parameter to be

estimated, are given as y(1) = 1.5, y(2) = 3.0 and y(3) = 5.0. Find the least squares estimate

of θ

A. 0.6459

B. 0.795

C. 0.895

D. 0.995

13. The estimator that minimizes the Bayes risk for the “hit-or-miss” cost function is

A. mode of the posterior pdf

B. median of the posterior pdf

C. mean of the posterior pdf

D. none of the above

14. The MAP estimator is usually easier to determine than the MMSE estimator since

A. it does not involve any differentiation

B. it does not involve any integration

C. it does not involve any prior PDF

D. it does not involve any maximization

15. Given the power spectral density of signal Pss(f) and that of noise Pww(f) = ζ2, the Wiener filter

frequency response H(f) for an infinite length non-causal filter is

A.

B.

C.

D.

16. For a DC level in WGN detection, assume that we wish to have PFA = 10-4 and

PD = 0.99. If the SNR is -30dB, the number of samples N required for detection is

A. 20, 465

B. 28, 646

C. 36, 546

D. 40, 486

17. Consider a binary hypothesis testing problem with the conditional probabilities of the

received data as

with hypotheses H0 and H1 being equilikely. Find the minimum probability of error

A. 0.2012

B. 0.3854

C. 0.4385

D. 0.5108

18. For binary hypothesis testing problem:

where c > 0, and [a,b] denote the uniform PDF in [a,b]. The condition for the perfect

detector (PFA = 0, PD = 1) is

A. c < 1∕2

B. c < 1

C. c > 1∕2

D. c > 1

19. Consider an M = 2 pulse amplitude modulation (PAM) scheme

and subjected to average energy constraint. To have minimum probability of error Pe, the best

choice for signal amplitudes A0 and A1 is

90 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

A. A0 = A1

B. A0 =

C. A0 = -

D. A0 = -A1

20. For linear model, x = Hθ + w, where s = Hθ. Which one of the following is correct?

A. ||s||2 + ||x - ||2 = ||x||2

B. ||s||2 + ||x - ||2 > ||x||2

C. ||s||2 + ||x - ||2 < ||x||2

D. ||x||2 + ||x - ||2 = ||s||2

21. The minimum Bayes risk for a binary hypothesis testing problem with costs C00 = 1, C11=

2, C10 = 2, and C01 = 4 is given by

where π0 is the prior probability of hypothesis H0. Find the values of the minimax risk and the

least favorable prior probability

A. (πL) = 1; πL = 1

B. (πL) = 0; πL = 1

C. (πL) = 2; πL = 0

D. (πL) = 2; πL = 0

22. Consider the PDFs for 0 and 0 given as:

A.

B.

C.

D.

23. Consider the detection problem:

where s[n] = A cos(2πƒ0n + ϕ) is the signal and w[n] the noise distributed as

w[n] ~ (0,ζ2). For estimating the amplitude of the signal A, if the detection statistics is

expression for d² is

A.

B.

C.

D.

24. The parameter which does not affect the performance of a NP-detector for detecting a

deterministic signal in white Gaussian noise is

A. Signal energy

B. Signal shape

C. Noise energy

D. Probability of false alarm

Answers to Multiple Choice Questions

1. (D) 2. (D) 3. (B) 4. (B) 5. (C) 6. (B) 7. (D) 8. (A)

91 Detection & Estimation Theory ECE-A.P IIIT Nuzvid

9. (A) 10. (A) 11. (D) 12. (D) 13. (A) 14. (B) 15. (A) 16. (C)

17. (C) 18. (A) 19. (D) 20. (A) 21. (D) 22. (C) 23. (C) 24. (B)

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.