Estimation and Detection: Lecture 9: Introduction Detection Theory (Chs 1,2,3)

Estimation and Detection
Lecture 9: Introduction Detection Theory

(Chs 1,2,3)
Dr. ir. Richard C. Hendriks – 9/12/2015

1
Example – Speech Processing
Voice activity detection (VAD):
In speech processing applications a VAD is commonly used, e.g.,
• In speech enhancement: to determine whether speech is present or not. If speech is

not present, the remaining signal consists of noise only and can be used to estimate
the noise statistics.
• Speech coding: Detect whether speech is present. If speech is not present, there is
no need for the device (phone) to transmit any information.
2
Example – Speech Processing
A VAD can be implemented using a Bayesian hypothesis test:
H0 : Yk (l) = Nk (l) (speech absence)

H1 : Yk (l) = Sk (l) + Nk (l) (speech absence)
Base on statistical models for S and N and the right hypothesis criterium, we can automat-
ically decide whether speech is absent or present.
(more details in our course speech processing in the course Digital audio and speech processing,
IN4182, 4th quarter).
How to optimally make the decision? ) Detection theory.
3
Example – Radio Pulsar Navigation
Pulsars (pulsating star):
• Highly magnetized rotating neutron star that emits a beam of electromagnetic radia-
tion.
• Radiation can only be observed when the beam of emission is pointing toward the
earth (lighthouse model)
• wideband (100 Mhz - 85 Ghz)
• extremely accurate pulse sources.
Kramer (University of Manchester)

4
For some millisecond pulsars, the regularity of pulsation is more precise than an atomic
clock.
• Pulsars are "ideal" for time-of-arrival
• pulsar signals are weak (SNR = -90 dB)
How to optimally make the decision? ) Detection theory.
5
6
What is Detection Theory?
Definition
Assume a set of data {x[0], x[1], . . . , x[N 1]} is available. To arrive at a decision, first we
form a function of the data or T (x[0], x[1], . . . , x[N 1]) and then make a decision based on
its value. Determining the function T and its mapping to a decision is the central problem
addressed in Detection Theory.
7
The Simplest Detection Problem
Binary detection: Determine whether a certain signal that is embedded in noise is present
or not.
H0 x[n] = w[n]
H1 x[n] = s[n] + w[n]
Note that if the number of hypotheses is more than two, then the problem becomes a
multiple hypothesis testing problem. One example is detection of different digits in speech
processing.
8
Example (1)
Detection of a DC level of amplitude A = 1 embedded in white Gaussian noise w[n] with
variance 2
with only one sample.
H0 : x[0] = w[0]
H1 : x[0] = 1 + w[0]
One possible detection rule:
1
H0 : x[0] <
2
1
H1 : x[0] >
2
for the case where x[0] = 12 , we might arbitrarily choose one of the possibilities. However
the probability of such a case is zero!
9
Example (2)
The probability density function of x[0] under each hypothesis is as follows
1 ⇣ 1 ⌘
2
p(x[0]; H0 ) = p exp 2
x [0]
2⇡ 2 2
1 ⇣ 1 ⌘
2
p(x[0]; H1 ) = p exp 2
(x[0] 1)
2⇡ 2 2
Deciding between H0 and H1 , we are essentially asking weather x[0] has been generated
according to the pdf p(x[0]; H0 ) or the pdf p(x[0]; H1 ).
10
Detection Performance
• Can we expect to always make a correct decision? Depending on the noise variance
2
, it will be more or less likely to make a decision error.
• How to make an optimal decision?
• The data under both H0 and H1 can be modelled with two different pdfs. Using these
pdfs, a decision rule can be formulated. A typical example:
N 1
1 X
T= x[n] >
N n=0
• The detection performance will increase as the "distance" between the pdfs under
both H0 and H1 increases.
• Performance measure: deflection coefficient
2 (E(T ; H1 ) E(T ; H0 ))2

d = 11
var(T ; H0 )
Today:
• Important pdfs
• Neyman-Pearson Theorem
• Minimum Probability of Error
12
Important pdfs – Gaussian pdf
1 ⇣ 1 ⌘
2
p(x) = p exp 2
(x µ) 1 < x < +1
2⇡ 2 2
where µ is the mean and 2
is the variance of x.
Standard normal pdf: µ = 0 and 2

=1
The cumulative distribution function (cdf) of a standard normal pdf:
Z ⇣
x
1 1 2⌘
(x) = p exp t dt
1 2⇡ 2
A more convenient description is the right-tail probability which is defined as Q(x) = 1
(x). This function which is called Q-function is used frequently in different detection prob-
lems where the signal and noise are normally distributed.
13
Important pdfs – Gaussian pdf
1 0.4
0.9
Q(x)
0.35
)(x)
0.8
0.3
0.7
0.25
0.6
Gaussian pdf
cdf / 1-cdf
0.5 0.2
0.4
0.15
0.3
0.1
0.2
0.05
0.1
0 0
-20 -15 -10 -5 0 5 10 15 20 -20 -15 -10 -5 0 5 10 15 20
x x
14
Important pdfs – central Chi-squared
P
v
A chi-squared pdf arises as the pdf of x, where x = x2i , if xi is a standard normally
i=1
distributed random variable. The chi-squared pdf with v degrees of freedom is defined as
8
> v
< v 1 x 2 1 exp 12 x , x > 0
22 (v2)
p(x) =
>
: 0, x<0
and is denoted by v.
2
v is assumed to be integer and v 1. The function (u) is the
Gamma function and is defined as 0.5
0.45
Z 1 0.4
(u) = tu 1
exp( t)dt 0.35
0
0.3
@2 pdf
0.25
8=2 (Exponential pdf)
0.2
0.15
8=20 (approaching Gaussian)
0.1
0.05
0 15
0 10 20 30 40 50 60 70 80 90 100
x
Important pdfs – non-central Chi-squared
P
v
If x = x2i , where xi ’s are i.i.d. Gaussian random variables with mean µi and variance
i=1
2
= 1, then x has a noncentral chi-squared pdf with v degrees of freedom and noncentrality
Pv
parameter = i=1 µ2i . The pdf then becomes
8 v 2 h i ⇣p ⌘
< 1 x 4 exp 1
2 (x + ) I 2 1 x , x>0
v
2
p(x) =
: 0, x<0
16
Making Optimal Decisions
Remember the example:
H0 : x[0] <
H1 : x[0] >
Using detection theory, rules can be derived on how to chose .
• Neyman-Pearson Theorem: Maximize detection probability for a given false alarm

probability.
• Minimum probability of error
• Bayesian detector
17
Neyman-Pearson Theorem - Introduction
Example: Assume that we observe a random variable whose pdf is either N (0, 1) or N (1, 1).
Our hypothesis problem is then:
H0 : µ=0
H1 : µ=1
Detection rule:
1
H0 : x[0] <
2
1
H1 : x[0] >
2
Hence, in this example for x[0] > 12 , p(x[0]; H1 ) > p(x[0]; H0 ).

Notice that two different type of errors can be made.
S. Kay – detection theory Figs. 3.2 and 3.3. 18

Neyman-Pearson Theorem – Detection Performance
Detection performance of a system is measured mainly by two factors:
1. Probability of false alarm: PF A = P (H1 ; H0 )
2. Probability of detection: PD = P (H1 ; H1 )
Note that sometimes instead of probability of detection, probability of miss detection, PM =

1 PD is used.
19
Neyman-Pearson Theorem – Detection Performance
• These two errors can be traded off against each other.
• It is not possible to reduce both error probabilities.
• False alarm probability PF A = P (H1 ; H0 )
• Detection probability PD = P (H1 ; H1 )
• To design the optimal detector, the Neyman-Pearson approach is

to maximise PD while keeping PF A fixed (small).
S. Kay – detection theory Figs. 3.2 and 3.3.
20
Neyman-Pearson Theorem
Problem statement
Assume a data set x = [x[0], x[1], ..., x[N 1]]T is available. The detection problem is defined
as follows
H0 : T (x) <
H1 : T (x) >
where T is the decision function and is the detection threshold. Our goal is to design T
so as to maximize PD subject to PF A < ↵.
21
Neyman-Pearson Theorem
To maximize PD for a given PF A = ↵ decide H1 if
p(x; H1 )
L(x) = >
p(x; H0 )
where the threshold is found from

Z
PF A = p(x; H0 )dx = ↵
{x:L(x)> }
The function L(x) is called the likelihood ratio and the entire test is called the likelihood
ratio test (LRT).
22
Neyman-Pearson Theorem - Derivation
max PD subject to PF A = ↵
Constraint optimization, use Lagrangian:
F = PD + (PF A ↵)
Z ✓Z ◆
= p(x; H1 )dx + p(x; H0 )dx ↵
R R1
Z 1
= (p(x; H1 ) + p(x; H0 )) dx ↵
R1
The problem now is (see Figures) to select to right range R1 and R0 . As we want to
maximise F , a value x should only be included in R1 if it increases the integrand. So, x
should only be included in R1 if
p(x; H1 ) + p(x; H0 ) > 0
23
Neyman-Pearson Theorem - Derivation
p(x; H1 ) + p(x; H0 ) > 0
p(x; H1 )
) >
p(x; H0 )
A likelihood ratio is always positive, so = > 0 (if > 0 we would have PF A = 1)
p(x; H1 )
> ,
p(x; H0 )
where is found from PF A = ↵.
24
Neyman-Pearson Theorem – Example DC in WGN
Consider the following signal detection problem
H0 : x[n] = w[n] n = 0, 1, . . . , N 1
H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1
where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the NP
detector decides H1 if
h PN i
1 1 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2
Taking the logarithm of both sides and simplification results in

N 1
1 X 2
A 0
x[n] > ln + =
N n=0 NA 2
25
PN 1 0
The NP detector compares the sample mean x̄ = 1
N n=0 x[n] to a threshold . To deter-
mine the detection performance, we first note that the test statistic T (x) = x̄ is Gaussian
under each hypothesis and its distribution is as follows
8
< N (0, 2 ) under H0
N
T (x) ⇠
: N (A, 2 ) under H1
N
✓ 0
◆ q
0 0 2
We have then PF A = P r(T (x) > ; H0 ) = Q p 2 /N
) = N Q 1
(PF A ) and
0
!
0 A
PD = P r(T (x) > ; H1 ) = Q p
2 /N
PD and PF A are related to each other according to the following equation

r !
NA 2 Signal energy-to-noise ratio
1
PD = Q Q (PF A ) 2
26
r !
1 N A2
PD = Q Q (PF A ) 2
Remember the deflection coefficient
2 (E(T ; H1 ) E(T ; H0 ))2

d =
var(T ; H0 )
N A2
In this case d2 = 2 .
Further notice that the detection performance (PD ) increases monotonic with the deflection
coefficient.
27
Neyman-Pearson Theorem – Example Change Var
Consider an IID process x[n].
2
H0 : x[n] ⇠ N (0, 0)
2
H1 : x[n] ⇠ N (0, 1 ),
with 2
1 > 0.
2
Neyman-Pearson test:
h PN i
1 1 1 2
N exp 2 12 n=0 x[n]
(2⇡ 12 ) 2
h PN i>
1 1 1 2
N exp 2 02 n=0 x [n]
(2⇡ 02 ) 2
28

✓ ◆ NX1 2
1 1 1 2 N 1
2 2 x [n] > ln + ln 2
2 1 0 n=0
2 0
we then have
N 1
1 X 2 0
x [n] >
N n=0
2
2 1
0 N ln +ln 2
with = 1 1
0
2 2
0 1
What about PD ?
29
For N = 1 we decide for H1 if:

q
0
|x[0]| > .
⇢ q ⇢ q
0 0
PF A = P r |x[0]| > ; H0 = 2P r x[0] > ; H0 .
p !
0
PF A = 2Q p .
2
0
q ✓ ◆q
0 1 1 2
) =Q PF A 0
2
1
p !
1 2
Q 2 PF A 0
PD = 2Q p
2
1
30
Receiver Operating Characteristics
The alternative way of summarizing the detection performance of a NP detector is to plot PD

versus PF A . This plot is called Receiver Operating Characteristics (ROC). For the former
N A2
DC level detection example, the ROC is shown here. Note that here 2 = 1.
31
Minimum Probability of Error
Assume the prior probabilities of H0 and H1 are known and represented by P (H0 ) and
P (H1 ), respectively. The probability of error, Pe , is then defined as
Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 ) = P (H1 )PM + P (H0 )PF A
Our goal is to design a detector that minimizes Pe . It is shown that the following detector is
optimal in this case
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
In case P (H0 ) = P (H1 ), the detector is called the maximum likelihood detector.
32
Minimum Probability of Error - Derivation
Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 )

Z Z
= P (H1 ) p(x|H1 )dx + P (H0 ) p(x|H0 )dx
R0 R1
We know that Z Z
p(x|H1 )dx = 1 p(x|H1 )dx,
R0 R1
such that
✓ Z ◆ Z
Pe = P (H1 ) 1 p(x|H1 )dx + P (H0 ) p(x|H0 )dx
R1 R1
Z
= P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1
33
Minimum Probability of Error - Derivation
Z
Pe = P (H1 ) + [P (H0 )p(x|H0 ) P (H1 )p(x|H1 )] dx
R1
We want to minimize Pe , so an x should only be included in the region R if the integrand
[P (H0 )P (x|H0 ) P (H1 )P (x|H1 )]
is negative for that x.
P (H0 )p(x|H0 ) < P (H1 )p(x|H1 )

p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
34
Minimum Probability of Error– Example DC in WGN
Consider the following signal detection problem
H0 : x[n] = w[n] n = 0, 1, . . . , N 1
H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1
where the signal is s[n] = A for A > 0 and w[n] is WGN with variance 2
. Now the min.
p(x|H1 ) P (H0 )
probability of error detector decides H1 if p(x|H0 ) > P (H1 ) = 1 (assuming P (H0 ) = P (H1 ) =
0.5), leading to h i
1 1
PN 1 2
N exp 2 2 n=0 (x[n] A)
(2⇡ 2 ) 2
h PN 1 2 i >1
1 1
N exp 2 2 n=0 x [n]
(2⇡ 2 ) 2

N 1
1 X A
x[n] >
N n=0 2
35
Minimum Probability of Error– Example DC in WGN
Pe is then given by
1
Pe = [P (H0 |H1 ) + P (H1 |H0 )]
2
" N 1 N 1
#
1 1 X 1 X
= P r( x[n] < A/2|H1 ) + P r( x[n] > A/2|H0 )
2 N n=0 N n=0
" !! !#
1 A/2 A A/2
= 1 Q p +Q p
2 2 /N 2 /N
r !
NA 2
= Q
4 2
36
Minimum Probability of Error – MAP detector
Starting from
p(x|H1 ) P (H0 )
> =
p(x|H0 ) P (H1 )
we can use Bayes’ rule:
p(x|Hi )p(Hi )
p(Hi |x) =
p(x)
we arrive at
p(H1 |x) > p(H0 |x).
this is called the MAP detector, which, if P (H1 ) = P (H0 ) reduces again to the ML detector.
37
Bayes Risk
A generalisation of the minimum Pe criterion is one where costs are assigned to each type
of error:
Let Cij be the cost if we decide Hi while Hj is true. Minimizing the expected costs we get
1 X
X 1
R = E[C] = Cij P (Hi |Hj )P (Hj )
0=1 j=0
If C10 > C00 and C01 > C11 the detector that minimises the Bayes risk is to decide H1 when
p(x|H1 ) C10 C00 P (H0 )

> = .
p(x|H0 ) C01 C11 P (H1 )
38

Estimation and Detection: Lecture 9: Introduction Detection Theory (Chs 1,2,3)

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Estimation and Detection: Lecture 9: Introduction Detection Theory (Chs 1,2,3)

Загружено:

Авторское право:

Доступные форматы

Estimation and Detection

Lecture 9: Introduction Detection Theory

Dr. ir. Richard C. Hendriks – 9/12/2015

• In speech enhancement: to determine whether speech is present or not. If speech is

H0 : Yk (l) = Nk (l) (speech absence)

How to optimally make the decision? ) Detection theory.

Pulsars (pulsating star):

• wideband (100 Mhz - 85 Ghz)

• extremely accurate pulse sources.

Kramer (University of Manchester)

• Pulsars are "ideal" for time-of-arrival

• pulsar signals are weak (SNR = -90 dB)

How to optimally make the decision? ) Detection theory.

One possible detection rule:

The probability density function of x[0] under each hypothesis is as follows

• How to make an optimal decision?

• Performance measure: deflection coefficient

2 (E(T ; H1 ) E(T ; H0 ))2

• Minimum Probability of Error

Standard normal pdf: µ = 0 and 2

The cumulative distribution function (cdf) of a standard normal pdf:

Using detection theory, rules can be derived on how to chose .

• Neyman-Pearson Theorem: Maximize detection probability for a given false alarm

• Minimum probability of error

Hence, in this example for x[0] > 12 , p(x[0]; H1 ) > p(x[0]; H0 ).

S. Kay – detection theory Figs. 3.2 and 3.3. 18

Detection performance of a system is measured mainly by two factors:

1. Probability of false alarm: PF A = P (H1 ; H0 )

2. Probability of detection: PD = P (H1 ; H1 )

Note that sometimes instead of probability of detection, probability of miss detection, PM =

• These two errors can be traded off against each other.

• It is not possible to reduce both error probabilities.

• False alarm probability PF A = P (H1 ; H0 )

• Detection probability PD = P (H1 ; H1 )

• To design the optimal detector, the Neyman-Pearson approach is

S. Kay – detection theory Figs. 3.2 and 3.3.

To maximize PD for a given PF A = ↵ decide H1 if

where the threshold is found from

Constraint optimization, use Lagrangian:

p(x; H1 ) + p(x; H0 ) > 0

where is found from PF A = ↵.

H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1

Taking the logarithm of both sides and simplification results in

PD and PF A are related to each other according to the following equation

Remember the deflection coefficient

2 (E(T ; H1 ) E(T ; H0 ))2

Consider an IID process x[n].

Taking the logarithm of both sides and simplification results in

For N = 1 we decide for H1 if:

The alternative way of summarizing the detection performance of a NP detector is to plot PD

Pe = P (H1 )P (H0 |H1 ) + P (H0 )P (H1 |H0 )

We want to minimize Pe , so an x should only be included in the region R if the integrand

[P (H0 )P (x|H0 ) P (H1 )P (x|H1 )]

is negative for that x.

P (H0 )p(x|H0 ) < P (H1 )p(x|H1 )

H1 : x[n] = s[n] + w[n] n = 0, 1, . . . , N 1

Taking the logarithm of both sides and simplification results in

p(x|H1 ) C10 C00 P (H0 )

Вам также может понравиться