Вы находитесь на странице: 1из 38

Estimation and Detection

Lecture 9: Introduction Detection Theory


(Chs 1,2,3)

Dr. ir. Richard C. Hendriks – 9/12/2015


1
Example – Speech Processing
Voice activity detection (VAD):
In speech processing applications a VAD is commonly used, e.g.,

• In speech enhancement: to determine whether speech is present or not. If speech is


not present, the remaining signal consists of noise only and can be used to estimate
the noise statistics.

• Speech coding: Detect whether speech is present. If speech is not present, there is
no need for the device (phone) to transmit any information.

2
Example – Speech Processing
A VAD can be implemented using a Bayesian hypothesis test:

H0 : Yk (l) = Nk (l) (speech absence)


H1 : Yk (l) = Sk (l) + Nk (l) (speech absence)

Base on statistical models for S and N and the right hypothesis criterium, we can automat-
ically decide whether speech is absent or present.

(more details in our course speech processing in the course Digital audio and speech processing,
IN4182, 4th quarter).

How to optimally make the decision? ) Detection theory.

3
Example – Radio Pulsar Navigation

Pulsars (pulsating star):

• Highly magnetized rotating neutron star that emits a beam of electromagnetic radia-
tion.

• Radiation can only be observed when the beam of emission is pointing toward the
earth (lighthouse model)

• wideband (100 Mhz - 85 Ghz)

• extremely accurate pulse sources.

Kramer (University of Manchester)


4
Example – Radio Pulsar Navigation

For some millisecond pulsars, the regularity of pulsation is more precise than an atomic
clock.

• Pulsars are "ideal" for time-of-arrival

• pulsar signals are weak (SNR = -90 dB)

How to optimally make the decision? ) Detection theory.

5
Example – Radio Pulsar Navigation

6
What is Detection Theory?

Definition

Assume a set of data {x[0], x[1], . . . , x[N 1]} is available. To arrive at a decision, first we
form a function of the data or T (x[0], x[1], . . . , x[N 1]) and then make a decision based on
its value. Determining the function T and its mapping to a decision is the central problem
addressed in Detection Theory.

7
The Simplest Detection Problem

Binary detection: Determine whether a certain signal that is embedded in noise is present
or not.
H0 x[n] = w[n]
H1 x[n] = s[n] + w[n]
Note that if the number of hypotheses is more than two, then the problem becomes a
multiple hypothesis testing problem. One example is detection of different digits in speech
processing.

8
Example (1)
Detection of a DC level of amplitude A = 1 embedded in white Gaussian noise w[n] with
variance 2
with only one sample.

H0 : x[0] = w[0]

H1 : x[0] = 1 + w[0]

One possible detection rule:

1
H0 : x[0] <
2
1
H1 : x[0] >
2

for the case where x[0] = 12 , we might arbitrarily choose one of the possibilities. However
the probability of such a case is zero!

9
Example (2)

The probability density function of x[0] under each hypothesis is as follows

1 ⇣ 1 ⌘
2
p(x[0]; H0 ) = p exp 2
x [0]
2⇡ 2 2
1 ⇣ 1 ⌘
2
p(x[0]; H1 ) = p exp 2
(x[0] 1)
2⇡ 2 2

Deciding between H0 and H1 , we are essentially asking weather x[0] has been generated
according to the pdf p(x[0]; H0 ) or the pdf p(x[0]; H1 ).

10
Detection Performance
• Can we expect to always make a correct decision? Depending on the noise variance
2
, it will be more or less likely to make a decision error.

• How to make an optimal decision?

• The data under both H0 and H1 can be modelled with two different pdfs. Using these
pdfs, a decision rule can be formulated. A typical example:
N 1
1 X
T= x[n] >
N n=0

• The detection performance will increase as the "distance" between the pdfs under
both H0 and H1 increases.

• Performance measure: deflection coefficient

2 (E(T ; H1 ) E(T ; H0 ))2


d = 11
var(T ; H0 )
Today:

• Important pdfs

• Neyman-Pearson Theorem

• Minimum Probability of Error

12
Important pdfs – Gaussian pdf

1 ⇣ 1 ⌘
2
p(x) = p exp 2
(x µ) 1 < x < +1
2⇡ 2 2
where µ is the mean and 2
is the variance of x.

Standard normal pdf: µ = 0 and 2


=1

The cumulative distribution function (cdf) of a standard normal pdf:

Z ⇣
x
1 1 2⌘
(x) = p exp t dt
1 2⇡ 2
A more convenient description is the right-tail probability which is defined as Q(x) = 1
(x). This function which is called Q-function is used frequently in different detection prob-
lems where the signal and noise are normally distributed.

13
Important pdfs – Gaussian pdf

1 0.4

0.9
Q(x)
0.35
)(x)
0.8
0.3
0.7

0.25
0.6

Gaussian pdf
cdf / 1-cdf

0.5 0.2

0.4
0.15

0.3
0.1
0.2

0.05
0.1

0 0
-20 -15 -10 -5 0 5 10 15 20 -20 -15 -10 -5 0 5 10 15 20
x x

14
Important pdfs – central Chi-squared
P
v
A chi-squared pdf arises as the pdf of x, where x = x2i , if xi is a standard normally
i=1
distributed random variable. The chi-squared pdf with v degrees of freedom is defined as
8
> v
< v 1 x 2 1 exp 12 x , x > 0
22 (v2)
p(x) =
>
: 0, x<0

and is denoted by v.
2
v is assumed to be integer and v 1. The function (u) is the
Gamma function and is defined as 0.5

0.45
Z 1 0.4

(u) = tu 1
exp( t)dt 0.35
0
0.3

@2 pdf
0.25
8=2 (Exponential pdf)
0.2

0.15
8=20 (approaching Gaussian)
0.1

0.05

0 15
0 10 20 30 40 50 60 70 80 90 100
x
Important pdfs – non-central Chi-squared

P
v
If x = x2i , where xi ’s are i.i.d. Gaussian random variables with mean µi and variance
i=1
2
= 1, then x has a noncentral chi-squared pdf with v degrees of freedom and noncentrality
Pv
parameter = i=1 µ2i . The pdf then becomes
8 v 2 h i ⇣p ⌘
< 1 x 4 exp 1
2 (x + ) I 2 1 x , x>0
v
2
p(x) =
: 0, x<0

16
Making Optimal Decisions
Remember the example:

H0 : x[0] <

H1 : x[0] >

Using detection theory, rules can be derived on how to chose .

• Neyman-Pearson Theorem: Maximize detection probability for a given false alarm


probability.

• Minimum probability of error

• Bayesian detector

17
Neyman-Pearson Theorem - Introduction
Example: Assume that we observe a random variable whose pdf is either N (0, 1) or N (1, 1).
Our hypothesis problem is then:

H0 : µ=0

H1 : µ=1

Detection rule:

1
H0 : x[0] <
2
1
H1 : x[0] >
2

Hence, in this example for x[0] > 12 , p(x[0]; H1 ) > p(x[0]; H0 ).


Notice that two different type of errors can be made.

S. Kay – detection theory Figs. 3.2 and 3.3. 18


Neyman-Pearson Theorem – Detection Performance

Detection performance of a system is measured mainly by two factors:

1. Probability of false alarm: PF A = P (H1 ; H0 )

2. Probability of detection: PD = P (H1 ; H1 )

Note that sometimes instead of probability of detection, probability of miss detection, PM =


1 PD is used.

19
Neyman-Pearson Theorem – Detection Performance

• These two errors can be traded off against each other.

• It is not possible to reduce both error probabilities.

• False alarm probability PF A = P (H1 ; H0 )

• Detection probability PD = P (H1 ; H1 )

• To design the optimal detector, the Neyman-Pearson approach is


to maximise PD while keeping PF A fixed (small).

S. Kay – detection theory Figs. 3.2 and 3.3.

20
Neyman-Pearson Theorem
Problem statement

Assume a data set x = [x[0], x[1], ..., x[N 1]]T is available. The detection problem is defined
as follows

H0 : T (x) <

H1 : T (x) >

where T is the decision function and is the detection threshold. Our goal is to design T
so as to maximize PD subject to PF A < ↵.

21
Neyman-Pearson Theorem

To maximize PD for a given PF A = ↵ decide H1 if

p(x; H1 )
L(x) = >
p(x; H0 )

where the threshold is found from


Z
PF A = p(x; H0 )dx = ↵
{x:L(x)> }

The function L(x) is called the likelihood ratio and the entire test is called the likelihood
ratio test (LRT).

22
Neyman-Pearson Theorem - Derivation
max PD subject to PF A = ↵

Constraint optimization, use Lagrangian:

F = PD + (PF A ↵)
Z ✓Z ◆
= p(x; H1 )dx + p(x; H0 )dx ↵
R R1
Z 1
= (p(x; H1 ) + p(x; H0 )) dx ↵
R1

The problem now is (see Figures) to select to right range R1 and R0 . As we want to
maximise F , a value x should only be included in R1 if it increases the integrand. So, x
should only be included in R1 if

p(x; H1 ) + p(x; H0 ) > 0

23
Neyman-Pearson Theorem - Derivation
p(x; H1 ) + p(x; H0 ) > 0
p(x; H1 )
) >
p(x; H0 )
A likelihood ratio is always positive, so = > 0 (if > 0 we would have PF A = 1)

p(x; H1 )
> ,
p(x; H0 )

where is found from PF A = ↵.

24
Neyman-Pearson Theorem – Example DC in WGN
Consider the following signal detection problem

H0 : x[n] = w[n] n = 0, 1, . . . , N 1

H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1

where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the NP
detector decides H1 if
h PN i
1 1 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2

Taking the logarithm of both sides and simplification results in


N 1
1 X 2
A 0
x[n] > ln + =
N n=0 NA 2
25
Neyman-Pearson Theorem – Example DC in WGN
PN 1 0
The NP detector compares the sample mean x̄ = 1
N n=0 x[n] to a threshold . To deter-
mine the detection performance, we first note that the test statistic T (x) = x̄ is Gaussian
under each hypothesis and its distribution is as follows
8
< N (0, 2 ) under H0
N
T (x) ⇠
: N (A, 2 ) under H1
N
✓ 0
◆ q
0 0 2
We have then PF A = P r(T (x) > ; H0 ) = Q p 2 /N
) = N Q 1
(PF A ) and

0
!
0 A
PD = P r(T (x) > ; H1 ) = Q p
2 /N

PD and PF A are related to each other according to the following equation


r !
NA 2 Signal energy-to-noise ratio
1
PD = Q Q (PF A ) 2
26
Neyman-Pearson Theorem – Example DC in WGN
r !
1 N A2
PD = Q Q (PF A ) 2

Remember the deflection coefficient

2 (E(T ; H1 ) E(T ; H0 ))2


d =
var(T ; H0 )

N A2
In this case d2 = 2 .
Further notice that the detection performance (PD ) increases monotonic with the deflection
coefficient.

27
Neyman-Pearson Theorem – Example Change Var

Consider an IID process x[n].

2
H0 : x[n] ⇠ N (0, 0)

2
H1 : x[n] ⇠ N (0, 1 ),

with 2
1 > 0.
2

Neyman-Pearson test:
h PN i
1 1 1 2
N exp 2 12 n=0 x[n]
(2⇡ 12 ) 2
h PN i>
1 1 1 2
N exp 2 02 n=0 x [n]
(2⇡ 02 ) 2

28
Neyman-Pearson Theorem – Example Change Var

Taking the logarithm of both sides and simplification results in


✓ ◆ NX1 2
1 1 1 2 N 1
2 2 x [n] > ln + ln 2
2 1 0 n=0
2 0

we then have
N 1
1 X 2 0
x [n] >
N n=0
2
2 1
0 N ln +ln 2
with = 1 1
0
2 2
0 1

What about PD ?

29
Neyman-Pearson Theorem – Example Change Var

For N = 1 we decide for H1 if:


q
0
|x[0]| > .
⇢ q ⇢ q
0 0
PF A = P r |x[0]| > ; H0 = 2P r x[0] > ; H0 .
p !
0
PF A = 2Q p .
2
0
q ✓ ◆q
0 1 1 2
) =Q PF A 0
2
1
p !
1 2
Q 2 PF A 0
PD = 2Q p
2
1

30
Receiver Operating Characteristics

The alternative way of summarizing the detection performance of a NP detector is to plot PD


versus PF A . This plot is called Receiver Operating Characteristics (ROC). For the former
N A2
DC level detection example, the ROC is shown here. Note that here 2 = 1.

31
Minimum Probability of Error
Assume the prior probabilities of H0 and H1 are known and represented by P (H0 ) and
P (H1 ), respectively. The probability of error, Pe , is then defined as

Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) = P (H1 )PM + P (H0 )PF A

Our goal is to design a detector that minimizes Pe . It is shown that the following detector is
optimal in this case
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
In case P (H0 ) = P (H1 ), the detector is called the maximum likelihood detector.

32
Minimum Probability of Error - Derivation

Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 )


Z Z
= P (H1 ) p(x|H1 )dx + P (H0 ) p(x|H0 )dx
R0 R1

We know that Z Z
p(x|H1 )dx = 1 p(x|H1 )dx,
R0 R1

such that
✓ Z ◆ Z
Pe = P (H1 ) 1 p(x|H1 )dx + P (H0 ) p(x|H0 )dx
R1 R1
Z
= P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1

33
Minimum Probability of Error - Derivation
Z
Pe = P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1

We want to minimize Pe , so an x should only be included in the region R if the integrand

[P (H0 )P (x|H0 ) P (H1 )P (x|H1 )]

is negative for that x.

P (H0 )p(x|H0 ) < P (H1 )p(x|H1 )


p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )

34
Minimum Probability of Error– Example DC in WGN
Consider the following signal detection problem

H0 : x[n] = w[n] n = 0, 1, . . . , N 1

H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1

where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the min.
p(x|H1 ) P (H0 )
probability of error detector decides H1 if p(x|H0 ) > P (H1 ) = 1 (assuming P (H0 ) = P (H1 ) =
0.5), leading to h i
1 1
PN 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >1
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2

Taking the logarithm of both sides and simplification results in


N 1
1 X A
x[n] >
N n=0 2
35
Minimum Probability of Error– Example DC in WGN

Pe is then given by

1
Pe = [P (H0 |H1 ) + P (H1 |H0 )]
2
" N 1 N 1
#
1 1 X 1 X
= P r( x[n] < A/2|H1 ) + P r( x[n] > A/2|H0 )
2 N n=0 N n=0
" !! !#
1 A/2 A A/2
= 1 Q p +Q p
2 2 /N 2 /N
r !
NA 2
= Q
4 2

36
Minimum Probability of Error – MAP detector

Starting from
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
we can use Bayes’ rule:
p(x|Hi )p(Hi )
p(Hi |x) =
p(x)
we arrive at
p(H1 |x) > p(H0 |x).

this is called the MAP detector, which, if P (H1 ) = P (H0 ) reduces again to the ML detector.

37
Bayes Risk

A generalisation of the minimum Pe criterion is one where costs are assigned to each type
of error:
Let Cij be the cost if we decide Hi while Hj is true. Minimizing the expected costs we get
1 X
X 1
R = E[C] = Cij P (Hi |Hj )P (Hj )
0=1 j=0

If C10 > C00 and C01 > C11 the detector that minimises the Bayes risk is to decide H1 when

p(x|H1 ) C10 C00 P (H0 )


> = .
p(x|H0 ) C01 C11 P (H1 )

38

Вам также может понравиться