Statistical Decision Theory Notes

Statistical Decision Theory
What is Detection Theory in Signal Processing
Radar is sending bursty signals and looking for signal

returning from possible aircraft
Decide which hypothesis is true: aircraft is present
or not. Did we receive noise or signal + noise?
Problem is made more difficult by the fact that pulse is
distorted version of the transmitted pulse.
Once we know that aircraft is present, we typically want to
estimate range/bearing etc. (estimation!)
First detection, then possibly estimation
In detection, use as effectively as possible the received
samples to detect the correct hypothesis
Digital Communication System: Application Example
Transmitter sends signal with different phase for bits one and
zero.
Now task of detection is to
choose which of the two
signals was sent, 1 or 0?
Unlike aircraft example,
now signals present for
both hypothesis!
- Also, priori probabilities
are known (1/2 and 1/2).
Not so for the radar problem!
Speech Recognition: Application Example
We may need to be able to recognize numbers from 0 to 9.

For example, implement by using stored speech samples for
the numbers and find closest match
Problem: each time the person utters the words the
samples will be somewhat different
Decide which number was spoken
Multiple hypothesis testing problem, more than two

hypothesis (which would be binary hypothesis testing)
DC Level in White Gaussian Noise
Lets consider detection of presence of DC level with
amplitude 𝐴 = 1, corrupted by WGN w[n] with variance 𝜎 2 .
Assume that only one sample is available.
Hypothesis are
𝐻0 : 𝑥[0] = 𝑤[0] (only noise)
𝐻1 : 𝑥[0] = 1 + 𝑤[0]
(signal+noise)
Right: histogram of x[0]
for H0 and H1. MC=1000.
Q: How would you decide
whether sample x[0] is generated
under H0 or H1?
DC Level in White Gaussian Noise
1 1 2
𝑝 𝑥 0 ; 𝐻0 = exp − 2 𝑥 [0]
2𝜋𝜎 2 2𝜎
1 1 2
𝑝 𝑥 0 ; 𝐻1 = exp − 2 𝑥 0 − 1
2𝜋𝜎 2 2𝜎
or
1 1 2
𝑝 𝑥 0 ;𝐴 = exp − 2 𝑥 0 − 𝐴
2𝜋𝜎 2 2𝜎
For which parameter test of PDF is
𝐻0 : 𝐴 = 0 (only noise)
𝐻1 : 𝐴 = 1 (signal + noise)
Hierachy of Detection Problems
Noise models: Gaussian known PDF, Gaussian unknown PDF,

NonGaussian known PDF, NonGaussian unknown PDF
Signal models: Deterministic known, Deterministic unknown,

Random known PDF, Random unknown PDF
All combinations of noise and signal models possible

=> Some combinations are very difficult!
Neyman-Pearson Theorem
-Decision H1 when H0 is often called false alarm. The

probability Pfa = 𝑃(𝐻1 ; 𝐻0 ) is the false alarm probability.
-In radar application and in many other applications, Pfa is
usually small number such as 1E-8 to control the disastrous
effect of false alarm. For example, we may launch missile if
we think enemy aircraft is present.
-Neyman-Pearson (NP) approach: We want to maximize the
probability of detection PD = 𝑃(𝐻1 ; 𝐻1 ) subject to Pfa = 𝛼,
where alpha is input parameter.
Assume DC level 𝐴 = 1 in WGN with variance 1. One sample.
Now for threshold 𝛾, the probability of false alarm is
Pfa = P H1 ; H0 = Prob(x 0 > 𝛾; 𝐻0 )
∞
1 1 2
=න exp − t 𝑑𝑡
2π 2
𝛾
= 𝑄(𝛾)
Now we can get threshold for given PFA by
𝛾 = 𝑄 −1 (𝑃𝑓𝑎 )
MATLAB Example: threshold = qfuncinv(1E-3)
 threshold = 3.0902.
qfuncinv(1E-8) = 5.6120. Lower PFA => Higher threshold!
Continue to assume DC level 𝐴 = 1 in WGN with variance 1.
PD = P H1 ; H1 = Prob(x 0 > 𝛾; 𝐻1 )
∞
1 1
=න exp − (t − 1)2 𝑑𝑡
2π 2
𝛾
= 𝑄(𝛾 − 1)
MATLAB:
PD=qfunc(qfuncinv(1E-3)-1) = 0.0183
PD=qfunc(qfuncinv(1E-8)-1) = 1.9941e-006 => very small PD,
the price to pay for low PFA of 1E-8!
Goal of detector is map observed data {x[0], x[1], … , x[N-1]}

into either H0 or H1 (for binary hypothesis testing). This
corresponds have mapping from each possible observed data
vector to hypothesis. These will lead to decision regions for
H0 and H1. The critical region is the region where H1 is
decided
𝑅1 = {𝑥: 𝑑𝑒𝑐𝑖𝑑𝑒 𝐻1 𝑜𝑟 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 }
For H0
𝑅0 = 𝑥: 𝑑𝑒𝑐𝑖𝑑𝑒 𝐻0 𝑜𝑟 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻1
Clearly, union of 𝑅0 and 𝑅1 = 𝑅𝑁 since everypoint must map
to either hypothesis.
Using decision regisions we can express PFA requirement with

𝑃𝑓𝑎 = න 𝑝 𝑥; 𝐻0 𝑑𝑥 = 𝛼
𝑅1
where 𝛼 is the target probability of false alarm.
In most detection problems there will be large number (even
infinite number) of possible decision regisions that satisfy PFA
contraint. We want to choose that critical region that
maximizes probability of detection
𝑃𝐷 = න 𝑝 𝑥; 𝐻1 𝑑𝑥
𝑅1
Is there easy way to do this???
Finally, Neyman-Pearson theorem!
𝑝 𝑥; 𝐻1
𝐿 𝑥 = >𝛾
𝑝 𝑥; 𝐻0
𝑃𝑓𝑎 = න 𝑝 𝑥; 𝐻0 𝑑𝑥 = 𝛼
{𝑥:𝐿 𝑥 >𝛾)
Neyman-Pearson Theorem: Example
1 1
exp − 𝑥 0 − 1 2
2𝜋 2
𝐿 𝑥 = >𝛾
1 1 2
exp − 𝑥 [0]
2𝜋 2
1 2
exp − 𝑥 0 − 2𝑥 0 + 1 − 𝑥 2 0 >𝛾
2
1
exp 𝑥 0 − >𝛾
2
1
𝑥 0 > log 𝛾 + = 𝛾 ′
2
′
 Decide H1 if x[0] > 𝛾 . Same form as in the previous ”ad hoc” example! Now, we
can find 𝛾 ′ with the PFA contraint
∞ 1 1
𝑃𝑓𝑎 = 𝑃𝑟𝑜𝑏 𝑥 0 > 𝛾 ′ ; 𝐻0 = ‫𝛾׬‬′ exp − t 2 𝑑𝑡 = 𝑄 𝛾 ′ = 𝛼.
2π 2
𝛾 ′ = 𝑄 −1 (𝛼)
Also same equation as before! Previous example was optimum in NP sense!
Neyman-Pearson Theorem: More General Example
H1: DC level A (>0) in WGN, N samples
H0: WGN, N samples
1 1 𝑁−1 2
𝑁/2 exp − 2 σ𝑛=0 𝑥 𝑛 − 𝐴
2𝜋𝜎 2 2𝜎
𝐿 𝑥 = >𝛾
1 1 𝑁−1 2
exp − 2 σ𝑛=0 𝑥 𝑛
2𝜋𝜎 2 𝑁/2 2𝜎
𝑁−1
1
− 2 −2𝐴 ෍ 𝑥 𝑛 + 𝑁𝐴2 > log(𝛾)
2𝜎
𝑛=0
𝑁−1
𝐴 𝑁𝐴2
2
෍ 𝑥 𝑛 > log 𝛾 +
𝜎 2𝜎 2
𝑛=0
𝑁−1
1 𝜎2 𝐴
෍𝑥 𝑛 > log 𝛾 + = 𝛾 ′
𝑁 𝐴𝑁 2
𝑛=0
Compare sample mean (estimate of A) to threshold!
𝑁−1
1
𝑇 𝑥 = ෍𝑥 𝑛
𝑁
𝑛=0
𝜎2
H0: T(x) follows Gaussian distribution with mean 0 and variance
𝑁
𝜎2
H1: T(x) follows Gaussian distribution with mean A and variance
𝑁
𝛾′ 𝜎 2
𝑃𝑓𝑎 = 𝑄 ⇒ 𝛾 ′ = 𝑄−1 (𝑃𝑓𝑎 )
𝜎2 𝑁
𝑁
𝛾′ − 𝐴 𝐴 2𝑁
𝑃𝐷 = 𝑄 = 𝑄 𝑄 −1 (𝑃𝑓𝑎 ) −
𝜎2 𝜎2
𝑁
𝐴2 𝑁
ENR =
𝜎2
1
0.9 PFA=0.1
PFA=0.01
0.8 PFA=0.001
PFA=0.0001
0.7
0.6
PD
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20
ENR [dB]
Mean-shifted Gauss-Gauss problem
Assume that we have test statistic 𝑇(𝑥) that follows Gaussian
distribution under H0 and H1
𝑁(𝜇0 , 𝜎 2 ) 𝑢𝑛𝑑𝑒𝑟 𝐻0
𝑇~ ൝
𝑁(𝜇1 , 𝜎 2 ) 𝑢𝑛𝑑𝑒𝑟 𝐻1
It can be shown that PD is obtained with
𝑃𝐷 = 𝑄 𝑄−1 𝑃𝑓𝑎 − 𝑑 2
so the deflection coefficient 𝑑 completely characterizes
performance for Gauss-Gauss problem
2
2
𝜇1 − 𝜇0
𝑑 =
𝜎2
Change in Variance
H0: WGN with variance 𝜎02 , N samples
H1: WGN with variance 𝜎12 (> 𝜎02 ), N samples
1 1 𝑁−1 2
𝑁/2 exp − 2 σ𝑛=0 𝑥 𝑛
2𝜋𝜎12 2𝜎1
𝐿 𝑥 = >𝛾
1 1 𝑁−1 2
σ
𝑁/2 exp − 2𝜎 2 𝑛=0 𝑥 𝑛
2𝜋𝜎02 0
𝑁−1
𝑁 2
𝑁 2
1 1 1 2
− log 𝜎1 + log 𝜎0 − 2 − 2 ෍ 𝑥 𝑛 > log(𝛾)
2 2 2 𝜎1 𝜎0
𝑛=0
𝑁−1 2log(𝛾) 2 2
1 + log 𝜎1 − log 𝜎 0
2 𝑁
෍𝑥 𝑛 > = 𝛾′
𝑁 1 1
2− 2
𝑛=0
𝜎0 𝜎1
So we calculate estimate of variance and if bigger than threshold we decide H1!
Receiver Operating Characteristics (ROC)
Assume that we have expression for PFA and PD as function
of threshold. Now vary threshold from –Inf to +Inf and
observe the (PFA,PD) pairs. Plot all observed (PFA,PD) pairs
into plot with x-axis being PFA and y-axis being PD.
Alternatively if we have expression for PD and a function of

PFA and SNR / ENR / deflection coefficient, vary PFA and plot
for each PFA the obtained PD.
Repeat operation for interesting SNR / ENR / deflection

coefficient values
=> Family of ROCs
Receiver Operating Characteristics (ROC)
1
0.9
0.8 ENR = 10 dB
0.7
0.6
PD
0.5 ENR = 0 dB
0.4
0.3
ROC is always
0.2 above this 45 angle line!!!
0.1
0
0 0.2 0.4 0.6 0.8 1
PFA
3.5. Irrelevant Data
Irrelevant data may be discarded, it does not affect likelihood
ratio test (LRT) of NP theorem
But be careful about which data is really irrelevant!
Consider DC level estimation in WGN and assume that we
observe reference noise samples wR[n] for n=0,1,2,…,N-1.
Observed data set is {x[0],x[1], …, x[N-1], wR[0], wR[1], …,
wR[N-1]}. If x[n]=w[n] under H0 and x[n]=A+w[n] under H1
and wR[n]=w[n] under both hypothesis, then actually wR[n]
can be used to cancel out noise!
T = x[0] – wR[0] = A for H1 and 0 for H0
So detector using T > A/2 will give perfect detection!!!
3.5. Irrelevant Data
As another example lets consider the following signal model
𝐻0 : 𝑥 𝑛 = 𝑤 𝑛 , 𝑛 = 0,1, ⋯ , 2𝑁 − 1
𝐴 + 𝑤[𝑛] 𝑛 = 0,1, ⋯ , 𝑁 − 1
𝐻1 : 𝑥 𝑛 = ቊ
𝑤[𝑛] 𝑛 = 𝑁, 𝑁 + 1, ⋯ , 2𝑁 − 1
𝑻 𝑻 T
So observed vector 𝒙 = 𝒙𝟏 𝒙𝟐 , where 𝐱𝟏
denotes the first
N samples and 𝐱𝟐 the rest of the samples.
1 1 𝑁−1 2 1 1 2𝑁−1 2
𝑁/2 exp − 2 σ𝑛=0 𝑥 𝑛 − 𝐴 exp − 2 σ𝑛=𝑁 𝑥 𝑛
2𝜋𝜎 2 2𝜎 2𝜋𝜎 2 𝑁/2 2𝜎
>𝛾
1 1 𝑁−1 2 1 1 2𝑁−1 2
exp − 2 σ𝑛=0 𝑥 𝑛 exp − 2 σ𝑛=𝑁 𝑥 𝑛
2𝜋𝜎 2 𝑁/2 2𝜎 2𝜋𝜎 2 𝑁/2 2𝜎
1 1 𝑁−1 2
𝑁 exp − σ 𝑥 𝑛 −𝐴 𝑝(𝒙𝟏 ; 𝐻1 )
2𝜋𝜎 2 2𝜎 2 𝑛=0
= >𝛾
1 1 𝑁−1 2 𝑝(𝒙𝟏 ; 𝐻0 )
exp − σ 𝑥 𝑛
2𝜋𝜎 2 𝑁 2𝜎 2 𝑛=0
Data 𝐱 𝟐 is infact irrelevant!!!
3.6. Minimum Probability of Error
In some applications we may naturally assign prior
probabilities to the hypothesis. For example, in digital
communication using BSPK or ON-OFF keying, both bits /
hypothesis are equally likely so that P(H0)=P(H1)=0.5. Of
course in radar application this is not possible.
Bayesian approach to hypothesis testing is analogous to
Bayesian Estimation
We can define probability of error to be
𝑃𝑒 = 𝑃𝑟 𝑑𝑒𝑐𝑖𝑑𝑒 𝐻0 , 𝐻1 𝑡𝑟𝑢𝑒 + 𝑃𝑟 𝑑𝑒𝑐𝑖𝑑𝑒 𝐻1, 𝐻0 𝑡𝑟𝑢𝑒
= 𝑃 𝐻0 𝐻1 𝑃 𝐻1 + 𝑃 𝐻1 𝐻0 𝑃(𝐻0 )
where 𝑃 𝐻𝑖 𝐻𝑗 is the conditional probability of deciding Hi
given that Hj is true, a bit different meaning than 𝑃(𝐻𝑖 ; 𝐻𝑗 ).
It can be shown that optimally we need to decide on H1 if
𝑝 𝑥 𝐻1 𝑃(𝐻0 )
> =𝛾
𝑝(𝑥|𝐻0 ) 𝑃(𝐻1 )
Similar to NP test! Just now the probabilities are conditional
and the threshold is given without need to search for it!
If prior probability are equal we decide on H1 if
𝑝(𝑥|𝐻1 ) > 𝑝(𝑥|𝐻0 )
This is called maximum likelihood (ML) detector.

H1: DC level A (>0) in WGN, N samples
H0: WGN, N samples
If these hypothesis correspong to bits of communication
signal, we can assume P(H0)=P(H1)=0.5 => ML Detector
applies. Decide H1 if
1 1 𝑁−1 2
2𝜋𝜎 2 𝑁/2 exp − 2𝜎 2 σ𝑛=0 𝑥 𝑛 − 𝐴
>1
1 1 𝑁−1 2
2𝜋𝜎 2 𝑁/2 2𝜎
Same form as before but now threshold is fixed!
𝑁−1
1
− 2 −2𝐴 ෍ 𝑥 𝑛 + 𝑁𝐴2 >0
2𝜎
𝑛=0
Means that decide on H1 if sample average is more than A/2! Very reasonable.
Let consider again this form
𝑝 𝑥 𝐻1 𝑃(𝐻0 )
> =𝛾
𝑝(𝑥|𝐻0 ) 𝑃(𝐻1 )
Lets write this as
𝑃 𝑥 𝐻1 𝑃(𝐻1 ) 𝑃(𝑥|𝐻0 )𝑃(𝐻0 )
>
𝑃(𝑥) 𝑃(𝑥)
⇒ 𝑃 𝐻1 𝑥 > 𝑃 𝐻0 𝑥
This detector is called maximum a posteriori (MAP) detector.

3.7. Bayes Risk
Suppose that we are inspecting parts for inclusion in a large machine
𝐻0 : 𝑃𝑎𝑟𝑡 𝑖𝑠 𝑓𝑎𝑢𝑙𝑡𝑦
𝐻1 : 𝑃𝑎𝑟𝑡 𝑖𝑠 𝑎𝑐𝑐𝑒𝑝𝑡𝑎𝑏𝑙𝑒
Lets assign costs of to errors, lets us denote 𝐶𝑖𝑗 as the cost of deciding on i but j is
actually true. Cost C10 should be more than C01 since if accept the faulty part to
be machine to whole machine may be faulty then. Instead, if we decide faulty
when part is actually acceptable we only lose the part. Bayes risk R is defined as
1 1
𝑅 = 𝐸 𝐶 = ෍ ෍ 𝐶𝑖𝑗 𝑃 𝐻𝑖 𝐻𝑗 𝑃 𝐻𝑗
𝑖=0 𝑗=0
We can assume C00=C11=0. Now, the detector that minimizes Bayes risk is
𝑝 𝑥 𝐻1 (𝐶10 − 𝐶00 )𝑃(𝐻0 )
> =𝛾
𝑝(𝑥|𝐻0 ) (𝐶01 − 𝐶11 )𝑃(𝐻1 )
which is again using LRT but now with cost-dependent threshold.
3.8. Multiple Hypothesis Testing
For case with more than two hypothesis, NP criterion is rarely used in practise.
Instead we use Bayes risk, now defined as
𝑀−1 𝑀−1
𝑅 = 𝐸 𝐶 = ෍ ෍ 𝐶𝑖𝑗 𝑃 𝐻𝑖 𝐻𝑗 𝑃 𝐻𝑗
𝑖=0 𝑗=0
where 𝑀 is the number of hypothesis. To minimize this cost we should choose the
hypothesis that minimizes
𝑀−1
𝐶𝑖 𝑥 = ෍ 𝐶𝑖𝑗 𝑃(𝐻𝑗 |𝑥)

𝑗=0
over i=0,1,…,M-1.
4.3. Matched Filters
Lets consider the case of deterministic signal in WGN
𝐻0 : 𝑥 𝑛 = 𝑤 𝑛 , 𝑛 = 0,1, ⋯ , 𝑁 − 1
𝐻1 : 𝑥 𝑛 = 𝑠 𝑛 + 𝑤 𝑛 , 𝑛 = 0,1, ⋯ , 𝑁 − 1
where w[n] is WGN with variance 𝜎 2 . Let us write the LRT test
1 1 𝑁−1 2
exp − 2 σ 𝑛=0 𝑥 𝑛 − 𝑠[𝑛]
2𝜋𝜎 2 𝑁/2 2𝜎
𝐿 𝑥 = >𝛾
1 1 𝑁−1 2
2𝜋𝜎 2 𝑁/2 2𝜎
𝑁−1 𝑁−1
1
⇒ ෍𝑥 𝑛 𝑠 𝑛 > 𝜎 2 log 𝛾 + ෍ 𝑠2 𝑛
2
𝑛=0 𝑛=0
𝑁−1
𝑇 𝑥 = ෍ 𝑥 𝑛 𝑠 𝑛 > 𝛾′
𝑛=0
In our previous case of DC level A in WGN, we get
𝑁−1
𝑇 𝑥 = 𝐴 ෍ 𝑥 𝑛 > 𝛾′
𝑛=0
Assume that A>0, now divide both sides by NA.
𝑁−1
1 𝛾′
෍𝑥 𝑛 > = 𝛾 ′′
𝑁 𝑁𝐴
𝑛=0
If A<0, the inequality reverses.
If, 𝑠 𝑛 = 𝑟 𝑛 , 0 < 𝑟 < 1, we get
𝑁−1
𝑇 𝑥 = ෍ 𝑥 𝑛 𝑟𝑛 > 𝛾′
𝑛=0
Matched filter can viewed to be correlator or replica-correlator since we correlate data with a
replica of the signal.
An alternative implementation processes the input signal with finite impulse response (FIR) filter
with impulse response
𝑠[𝑁 − 1 − 𝑛] 𝑛 = 0,1, ⋯ , 𝑁 − 1
ℎ 𝑛 =ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
and samples the output at time n=N-1.
Now the filter output at time n=N-1 is

𝑁−1
𝑦 𝑁 − 1 = ෍ 𝑥 𝑛 𝑠[𝑛]
𝑛=0
which is exactly the same as before! Proof:
𝑖=𝑁−1
𝑦 𝑛 = ෍ 𝑏𝑖 𝑥[𝑛 − 𝑖]
𝑖=0
𝑖=𝑁−1 𝑁−1
𝑦 𝑁 − 1 = ෍ 𝑠[𝑁 − 1 − 𝑖]𝑥[𝑁 − 1 − 𝑖] = ෍ 𝑥 𝑛 𝑠[𝑛]

𝑖=0 𝑛=0
Matched filter output is sampled at N-1 but lets look at its output also at other
values, for the case of DC level estimation in WGN with signal being [1 1 1 1 1]
(N=5). 5
X: 4
Y: 5
4.5
3.5
3
y[n]
2.5
1.5
0.5
0
0 2 4 6 8 10
n
Best performance when sampled at N-1! Note noise may, at times, distort the
maximum location but N-1 is still best sampling instant.
4.3. Matched Filters: Performance (under WGN)
Let us check the mean and variance for matched filter output
Under H0
𝑁−1
𝐸 𝑇; 𝐻0 = 𝐸 ෍ 𝑤 𝑛 𝑠 𝑛 =0
𝑛=0
𝑁−1 𝑁−1
𝑉𝑎𝑟 𝑇; 𝐻0 = 𝑉𝑎𝑟 ෍ 𝑤 𝑛 𝑠 𝑛 = ෍ 𝑉𝑎𝑟 𝑤 𝑛 𝑠 2 𝑛 = 𝜀𝜎 2

𝑛=0 𝑛=0
Under H1
𝑁−1
𝐸 𝑇; 𝐻1 = 𝐸 ෍ (𝑤 𝑛 + 𝑠[𝑛])𝑠 𝑛 =𝜀
𝑛=0
𝑁−1
𝑉𝑎𝑟 𝑇; 𝐻0 = 𝑉𝑎𝑟 ෍ (𝑤 𝑛 + 𝑠[𝑛])𝑠 𝑛 = 𝜀𝜎 2

𝑛=0
4.3. Matched Filters: Performance
Now we know that distributions under H0 and H1 are
𝐻0 : 𝑁 0, 𝜎 2 𝜀
𝐻1 : 𝑁 𝜀, 𝜎 2 𝜀
Now we get that probability of false alarm is
𝛾′
𝑃𝑓𝑎 = 𝑄
𝜎 2𝜀
𝛾 ′ = 𝑄 −1 (𝑃𝑓𝑎 ) 𝜎 2 𝜀
Probability of detection is
𝛾′ − 𝜀 𝑄−1 (𝑃𝑓𝑎 ) 𝜎 2 𝜀 − 𝜀
𝑃𝐷 = 𝑄 =𝑄
2
𝜎 𝜀 𝜎 2𝜀
−1
𝜀
= 𝑄 𝑄 (𝑃𝑓𝑎 ) −
𝜎2
The shape of the signal does not affect detection performance! It only depends on
𝜀
selected level of PFA and energy-to-noise-ratio 2.
𝜎
4.3. Matched Filters: Performance
1 2
0.9 1.8
0.8 1.6
0.7 1.4
0.6 1.2
0.5 1
0.4 0.8
0.3 0.6
0.2 0.4
0.1 0.2
0 0
0 2 4 6 8 10 0 2 4 6 8 10
4.4. Generalized Matched Filter
Generalized matched filter handles the case where noise is not WGN but instead
colored Gaussian noise 𝑊~𝑁(0, 𝑪).
To determine the NP detector for this we again use the LRT test
1 1 𝑇 𝑪−1 𝒙 − 𝒔
𝑝 𝒙; 𝐻1 = 𝑁 1 exp − 𝒙 − 𝒔
2
2𝜋 det C
2 2
1 1 𝑇 −1
𝑝 𝒙; 𝐻0 = 𝑁 1 exp − 2 𝒙 𝑪 𝒙
2𝜋 2 det C 2
𝑝(𝑥; 𝐻1 ) 1
𝑙 𝑥 = log =− 𝒙 − 𝒔 𝑇 𝑪−1 𝒙 − 𝒔 − 𝒙𝑇 𝑪−1 𝒙
𝑝(𝑥; 𝐻0 ) 2
𝑻 −𝟏
𝟏 𝑻 −𝟏
=𝒙 𝑪 𝒔− 𝒔 𝑪 𝒔
𝟐
Since the second term does not depend on the data we get the equivalent test
𝑇 𝒙 = 𝒙𝑻 𝑪−𝟏 𝒔 > 𝛾 ′
Let us check that general equation reduces to our previous one for WGN
For WGN, 𝑪 = 𝜎 2 𝑰, for that we get
𝑇 𝒙 = 𝒙𝑻 𝒔/𝜎 2 > 𝛾 ′
𝑁−1
𝑇 𝑥 = ෍ 𝑥 𝑛 𝑠 𝑛 > 𝜎 2 𝛾 ′ = 𝛾 ′′
𝑛=0
Same as before!
Let us assume that C = diag(𝜎02 , 𝜎12 , ⋯ , 𝜎𝑁−1
2
). Now we get
𝑁−1
𝑥𝑛𝑠𝑛
𝑇 𝒙 = 𝒙𝑻 𝑪−𝟏 𝒔 =෍ > 𝛾 ′
𝜎𝑛2
𝑛=0
𝑁−1
𝑥𝑛 𝑠𝑛
=෍
𝜎𝑛 𝜎𝑛
𝑛=0
Under H1
𝑁−1
𝑤𝑛 𝑠𝑛 𝑠𝑛
𝑻(𝒙) = ෍ +
𝜎𝑛 𝜎𝑛 𝜎𝑛
𝑛=0
Generalized matched filter prewhitens noise samples and also distorts the signal.
After prewhitening, correlation with distorted signal.
Let us write 𝑪−𝟏 = 𝑫𝑻 𝑫. Now test statistic
𝑻
𝑇 𝒙 = 𝒙𝑻 𝑪−𝟏 𝒔 = 𝒙𝑻 𝑫𝑻 𝑫𝒔 = 𝒙′ 𝒔′
where
𝒔′ = 𝑫𝒔
and
𝒙′ = 𝑫𝒙
To show that WGN is indeed produced, assume 𝒘′ = 𝑫𝒘.
Now,
𝑻
𝑪𝒘′ = 𝑬 𝒘′ 𝒘′ = 𝑬 𝑫𝒘𝒘𝑻 𝑫𝑻
= 𝑫𝑬 𝒘𝒘𝑻 𝑫𝑻 = 𝑫𝑪𝑫𝑻
𝑻 −𝟏 𝑻
=𝑫 𝑫 𝑫 𝑫 =𝑰
𝑇 𝒙 = 𝒙𝑻 𝑪−𝟏 𝒔 > 𝛾 ′
Let us determine the performance of the generalized matched filter
Under H0,
𝑬 𝑻; 𝑯𝟎 = 𝑬[𝒘𝑻 𝑪−𝟏 𝒔] = 𝟎
Under H1,
𝑬 𝑻; 𝑯𝟏 = 𝑬[ 𝒔 + 𝒘 𝑻 𝑪−𝟏 𝒔] = 𝒔𝑻 𝑪−𝟏 𝒔
Under H0,
𝑽𝒂𝒓 𝑻; 𝑯𝟎 = 𝑬 𝒔𝑻 𝑪−𝟏 𝒘𝒘𝑻 𝑪−𝟏 𝒔 =
= 𝒔𝑻 𝑪−𝟏 𝑬 𝒘𝒘𝑻 𝑪−𝟏 𝒔 = 𝒔𝑻 𝑪−𝟏 𝒔
Under H1 it can be shown that also
𝑽𝒂𝒓 𝑻; 𝑯𝟏 = 𝒔𝑻 𝑪−𝟏 𝒔
Now under H0
𝜆
𝑃𝐹𝐴 = 𝑄
𝒔𝑻 𝑪−𝟏 𝒔
𝜆 = 𝑄 −1 𝑃𝐹𝐴 𝒔𝑻 𝑪−𝟏 𝒔
𝜆 − 𝒔𝑻 𝑪−𝟏 𝒔
𝑃𝐷 = 𝑄
𝑄 −1 𝑃𝐹𝐴 𝒔𝑻 𝑪−𝟏 𝒔 − 𝒔𝑻 𝑪−𝟏 𝒔
=𝑄
= 𝑄 𝑄−1 𝑃𝐹𝐴 − 𝒔𝑻 𝑪−𝟏 𝒔
Before, only signal energy mattered. Now, the signal shape also matters!
=> Design signal shape (for given energy) to maximize PD!
4.5. Multiple Signals: Binary case
Let us assume that now instead of detecting if a known signal is present or not,
the problem is detecting which signal was sent. For example in communication
system we should find out which out of M signal was sent.
𝐻0 : 𝑥 𝑛 = 𝑠0 𝑛 + 𝑤 𝑛 , 𝑛 = 0,1, ⋯ , 𝑁 − 1
𝐻1 : 𝑥 𝑛 = 𝑠1 𝑛 + 𝑤 𝑛 , 𝑛 = 0,1, ⋯ , 𝑁 − 1
Let us use minimum error probability criterion. We decide H1 if
𝑝(𝑥|𝐻1 ) 𝑃(𝐻0 )
>𝛾= =1
𝑝(𝑥|𝐻0 ) 𝑃 𝐻1
This is the ML rule. By using the definition of the multivariate Gaussian we get that
we select the hypothesis 𝑖 for which
𝑁−1
𝐷𝑖2 = ෍ 𝑥 𝑛 − 𝑠𝑖 𝑛 2
𝑛=0
is minimum. We can write
𝐷𝑖2 = 𝒙 − 𝒔𝑖 2
So we choose the hypothesis whose signal vector is closest to 𝒙.
4.5. Multiple Signals: Binary case
We select the hypothesis 𝑖 for which
𝑁−1
𝐷𝑖2 = ෍ 𝑥 𝑛 − 𝑠𝑖 𝑛 2
𝑛=0
is minimum. We can write this as
𝑁−1
𝐷𝑖2 = ෍ (𝑥 2 𝑛 − 2𝑥 𝑛 𝑠𝑖 𝑛 + 𝑠𝑖2 [𝑛])

𝑛=0
But first term is the same for all 𝑖! So the minimize
𝑁−1
1 2
෍ (−𝑥 𝑛 𝑠𝑖 𝑛 + 𝑠𝑖 𝑛 )
2
𝑛=0
Or we maximize
𝑁−1 𝑁−1 𝑁−1
1 2 1
෍ 𝑥 𝑛 𝑠𝑖 𝑛 − ෍ 𝑠𝑖 𝑛 = ෍ 𝑥 𝑛 𝑠𝑖 𝑛 − 𝜀𝑖
2 2
𝑛=0 𝑛=0 𝑛=0
Almost same as matched filter but there is bias term to account for possibly
different energies of the signals. If all signals have the same energy its not needed.
4.6. Linear Model
Linear model is applicable to many real world situations
𝒙 = 𝑯𝜽 + 𝒘
where 𝒘~𝑵 𝟎, 𝑪 . Here, our problem will be
𝐻0 : 𝒙 = 𝒘
𝐻1 : 𝒙 = 𝑯𝜽𝟏 + 𝒘
Now by using 𝒔 = 𝑯𝜽 we get for general matched filter

𝑇 𝒙 = 𝒙𝑻 𝑪−𝟏 𝒔 = 𝒙𝑻 𝑪−𝟏 𝑯𝜽𝟏
Recall that MVU estimate for the parameter vector in the linear model is
−𝟏 𝑻 −𝟏
෡ 𝑻 −𝟏
𝜽= 𝑯 𝑪 𝑯 𝑯 𝑪 𝒙
Now we can write
𝑻 𝒙 = 𝒙𝑻 𝑪−𝟏 𝑯𝜽𝟏
−𝟏 𝑻
= 𝑯𝑻 𝑪−𝟏 𝑯 𝑯𝑻 𝑪−𝟏 𝑯 𝑯𝑻 𝑪−𝟏 𝒙 𝜽𝟏
𝑻
= 𝑯𝑻 𝑪−𝟏 𝑯 ෡
𝜽 𝜽𝟏 = 𝜽 ෡ 𝑻 𝑯𝑻 𝑪−𝟏 𝑯 𝜽𝟏
=𝜽෡ 𝑻 𝑪−𝟏
෡ 𝜽𝟏
𝜽
4.6. Linear Model
We can make alternative hypothesis (parameter test)
𝐻0 : 𝜽 = 𝟎
𝐻1 : 𝜽 = 𝜽𝟏
Performance is found with
𝑃𝐷 = 𝑄 𝑄−1 𝑃𝐹𝐴 − 𝜽𝑻𝟏 𝑪−𝟏
෡ 𝜽𝟏
𝜽

Statistical Decision Theory Notes

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistical Decision Theory Notes

Загружено:

Авторское право:

Доступные форматы

Statistical Decision Theory

What is Detection Theory in Signal Processing

Radar is sending bursty signals and looking for signal

We may need to be able to recognize numbers from 0 to 9.

Multiple hypothesis testing problem, more than two

Noise models: Gaussian known PDF, Gaussian unknown PDF,

Signal models: Deterministic known, Deterministic unknown,

All combinations of noise and signal models possible

-Decision H1 when H0 is often called false alarm. The

Goal of detector is map observed data {x[0], x[1], … , x[N-1]}

Using decision regisions we can express PFA requirement with

Finally, Neyman-Pearson theorem!

Alternatively if we have expression for PD and a function of

Repeat operation for interesting SNR / ENR / deflection

If prior probability are equal we decide on H1 if

𝑝(𝑥|𝐻1 ) > 𝑝(𝑥|𝐻0 )

This is called maximum likelihood (ML) detector.

This detector is called maximum a posteriori (MAP) detector.

𝐶𝑖 𝑥 = ෍ 𝐶𝑖𝑗 𝑃(𝐻𝑗 |𝑥)

If, 𝑠 𝑛 = 𝑟 𝑛 , 0 < 𝑟 < 1, we get

Now the filter output at time n=N-1 is

𝑦 𝑁 − 1 = ෍ 𝑠[𝑁 − 1 − 𝑖]𝑥[𝑁 − 1 − 𝑖] = ෍ 𝑥 𝑛 𝑠[𝑛]

𝑉𝑎𝑟 𝑇; 𝐻0 = 𝑉𝑎𝑟 ෍ 𝑤 𝑛 𝑠 𝑛 = ෍ 𝑉𝑎𝑟 𝑤 𝑛 𝑠 2 𝑛 = 𝜀𝜎 2

𝑉𝑎𝑟 𝑇; 𝐻0 = 𝑉𝑎𝑟 ෍ (𝑤 𝑛 + 𝑠[𝑛])𝑠 𝑛 = 𝜀𝜎 2

𝐷𝑖2 = ෍ (𝑥 2 𝑛 − 2𝑥 𝑛 𝑠𝑖 𝑛 + 𝑠𝑖2 [𝑛])

Now by using 𝒔 = 𝑯𝜽 we get for general matched filter

Вам также может понравиться