Вы находитесь на странице: 1из 51

Bayesian Hypothesis testing

Statistical Decision Theory I.


Simple Hypothesis testing.
Binary Hypothesis testing
Bayesian Hypothesis testing.
Minimax Hypothesis testing.
Neyman-Pearson criterion.
M-Hypotheses.
Receiver Operating Characteristics.
Composite Hypothesis testing.
Composite Hypothesis testing approaches.
Performance of GLRT for large data records.
Nuisance parameters.

Classical detection and estimation theory.
What is detection?
Signal detection and estimation is the area of study that deals with
the processing of information-bearing signals for the purpose of
extracting information from them.





A simple digital communication system.
Transmitter
Source
Channel
Digital
Sequence
Signal
Sequence
s(t)
r(t)=s(t)+n(t)
T T
1 0
T T
sin
0
t sin
1
t

Components of a decision theory problem.
1. Source - that generates an
output.
2. Probabilistic transition
mechanism - a device that
knows which hypothesis is
true. It generates a point
in the observation space
accordingly to some
probability law.
3. Observation space describes all the outcomes of the transition
mechanism.
4. Decision - to each point in observation space is assigned one of the
hypotheses
Probabilistic
transition
mechanism
Source Observation
space
Decision
rule
Decision
H
1
H
0

Components of a decision theory problem.
Example:
When
1
H is true the source
generates +1.
When
0
H is true the source
generates -1.
An independent discrete random
variable n whose probability
density is added to the source
output.
1
2
1
4
1
4
N
0 -1 +1
p
n
(N)
1
2
1
4
1
4
N
+1 0 +2
p
n
(N H
1
)
1
2
1
4
1
4
N -1 -2 0
p
n
(N H
0
)

Source
H
1
H
0
+1
-1
Transition
mechanism
Observation
space
r

The sum of the source output and n is observed variable r.
Observation space has finite dimension, i.e. observation consists of a
set of N numbers and can be represented as a point in N dimensional
space.
Under the two hypotheses, we have

1
0
: 1
: 1
H r n
H r n
= +
= +

After observing the outcome in the observation space we shall guess
which hypothesis is true.
We use a decision rule that assigns each point to one of the
hypotheses.
Detection and estimation applications involve making inferences from
observations that are distorted or corrupted in some unknown
manner.
Simple binary hypothesis testing.
The decision problem in which each of two source outputs
corresponds to a hypothesis.
Each hypothesis maps into a point in the observation space.
We assume that the observation space is a set of N observations:
1 2
, , ,
N
r r r .
Each set can be represented as a vector r:

1
2
N
r
r
r
l
l
l
l
l
l
l
l
l
r


The probabilistic transition mechanism generates points in accord
with the two known conditional densities
( )
1
1
|
|
H
p H
r
R ,
( )
0
0
|
|
H
p H
r
R .
The objective is to use this information to develop a decision rule.
Decision criteria.
In the binary hypothesis problem either
0
H or
1
H is true.
We are seeking decision rules for making a choice.
Each time the experiment is conducted one of four things can happen:
1.
0
H true; choose
0
H correct
2.
0
H true; choose
1
H
3.
1
H true; choose
1
H correct
4.
1
H true; choose
0
H
The purpose of a decision criterion is to attach some relative
importance to the four possible courses of action.
The method for processing the received data depends on the decision
criterion we select.
Bayesian criterion.
Source generates two outputs
with given (a priori)
probabilities
1 0
, P P . These
represent the observer
information before the
experiment is conducted.
The cost is assigned to each course of actions.
00 10 01 11
, , , C C C C .
Each time the experiment is conducted a certain cost will be incurred.
The decision rule is designed so that on the average the cost will be
as small as possible.
Two probabilities are averaged over: the a priori probability and
probability that a particular course of action will be taken.
Source
p
r H1
(R H
1
)
p
r H0
(R H
0
)
Z: Observation space
R
R
Z
0
Z
0
Say:H
0
Z
1
Say:H
1

The expected value of the cost is

( )
( )
( )
( )
00 0 0 0
10 0 1 0
11 1 1 1
01 1 0 1
Pr say | is true
Pr say | is true
Pr say | is true
Pr say | is true
C P H H
C P H H
C P H H
C P H H
=
+
+
+
R

The binary observation rule divides the total observation space Z
into two parts:
0 1
, Z Z .
Each point in observation space is assigned to one of these sets.
The expression of the risk in terms of transition probabilities and the
decision regions:
( ) ( )
( ) ( )
0 0
0 1
1 1
1 0
00 0 0 10 0 0
r| r|
11 1 1 01 1 1
r| r|
R | R R | R
R | R R | R
H H
Z Z
H H
Z Z
C P p H d C P p H d
C P p H d C P p H d
= +
+ +


R


0 1
, Z Z cover the observation space (the integrals integrate to one).
We assume that the cost of a wrong decision is higher than the cost
of a correct decision.

10 00
01 11
C C
C C
>
>

For Bayesian test the regions
0
Z and
1
Z are chosen such that the risk
will be minimized.
We assume that the decision is to be made for each point in
observation space.
( )
0 1
Z Z Z = +
The decision regions are defined by the statement:

( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0
| |
11 1 1 01 1 1
| |
| |
| |
H H
Z Z Z
H H
Z Z Z
C P p H d C P p H d
C P p H d C P p H d

= +
+ +


r r
r r
R R R R
R R R R
R

Observing that

( ) ( )
0 1
0 1
| |
| | 1
H H
Z Z
p H d p H d = =

r r
R R R R


( ) ( )
( ) ( )
1
0
0
1 01 11 1
|
0 10 1 11
0 10 00 0
|
|
|
H
Z
H
P C C p H
PC PC d
P C C p H
' '
l
1 1

1 1
l
1 1
= + +
l ! !
1 1
l

1 1
l
1 1 l
+ +

r
r
R
R
R
R
The integral represents the cost controlled by those points R that we
assign to
0
Z .
The value of R where the second term is larger than the first
contribute to the negative amount to the integral and should be
included in
0
Z .
The value of R where two terms are equal has no effect.
The decision regions are defined by the statement:
If
( ) ( ) ( ) ( )
1 0
1 01 11 1 0 10 11 0
r| r|
R | R |
H H
P C C p H P C C p H ,
assign R to
1
Z and say that
1
H is true. Otherwise assign R to
0
Z
and say that
0
H is true.
This may be expressed as:

( )
( )
( )
( )
0
1
1
0
1 r|
0 00 10
0 1 01 11 r|
R |
R |
H
H
H
H
p H
P C C
p H P C C


( )
( )
( )
1
0
1
r|
0
r|
R |
R
R |
H
H
p H
p H
= is called likelihood ratio.
Regardless of the dimension of R, ( ) R is one-dimensional variable.
Data processing is involved in computing ( ) R and is not affected by
the prior probabilities and cost assignments.
The quantity
( )
( )
0 10 00
1 01 11
P C C
P C C

is the threshold of the test.


The can be left as a variable threshold and may be changed if our a
priori knowledge or costs are changed.
Bayes criterion has led us to a Likelihood Ratio Test (LRT)
( )
0
1
R
H
H

An equivalent test is ( )
0
1
ln R ln
H
H

Summary of the Bayesian test:
The Bayesian test can be conducted simply by calculating the
likelihood ratio ( ) R and comparing it to the threshold.
Test design:
Assign a-priori probabilities to the source outputs.
Assign costs for each action.
Assume distribution for
( )
1
1
|
|
H
p H
r
R ,
( )
0
0
|
|
H
p H
r
R .
calculate and simplify the ( ) R
Likelihood ratio processor
Data
Processor
( ) R
( )
1
0
R
H
H

Threshold device
Decision
R
Special case.
00 11
0 C C = =
01 10
1 C C = =

( ) ( )
0 1
0 0
0 0 1 1
| |
| |
H H
Z Z Z
P p H d P p H d

= +

r r
R R R R R

( )
( )
0
1
0
0 1
1
ln R ln ln ln 1
H
H
P
P P
P
=
When the two hypotheses are equally likely, the threshold is zero.


Sufficient statistics.
Sufficient statistics is a function T that transfers the initial data set
to the new data set ( ) T R that still contains all necessary information
contained in R regarding the problem under investigation.
The set that contains a minimal amount of elements is called minimal
sufficient statistics.
When making a decision knowing the value of the sufficient statistic
is just as good as knowing R.
The integrals in the Bayes test.
False alarm:
( )
0
1
0
|
|
F H
Z
P p H d =

r
R R.
We say that target is present when it is not.
Probability of detection:
( )
1
1
1
|
|
D H
Z
P p H d =

r
R R.
Probability of miss:
( )
1
0
1
|
|
M H
Z
P p H d =

r
R R.
We say target is absent when it is present.
Special case: the prior probabilities unknown.
Minimax test.
( ) ( )
( ) ( )
0 0
0 0
1 1
0 0
00 0 0 10 0 0
| |
11 1 1 01 1 1
| |
| |
| |
H H
Z Z Z
H H
Z Z Z
C P p H d C P p H d
C P p H d C P p H d

= +
+ +


r r
r r
R R R R
R R R R
R

If the regions
0
Z and
1
Z fixed the integrals are determined.

( ) ( )( )
0 10 1 11 1 01 11 0 10 00
1
M F
PC PC P C C P P C C P = + + R

0 1
1 P P =
The Bayesian risk will be function of
1
P .

( ) ( )
( ) ( ) ( )
1 00 10
1 11 00 01 11 10 00
1
F F
M F
P C P C P
P C C C C P C C P
= +
l
+ +
l
l
R


Bayesian test can be found if all the costs and a priori probabilities
are known.
If we know all the probabilities we can calculate the Bayesian cost.
Assume that we do not know
1
P and just assume a certain one
*
1
P
and design a corresponding test.
If
1
P changes the regions
0 1
, Z Z changes and with these also
F
P and
D
P .

The test is designed for
*
1
P but the actual a priori probability is
1
P .
By assuming
*
1
P we fix
F
P and
D
P .
Cost for different
1
P is given by a
function
( )
*
1 1
, P P R .
Because the threshold is fixed the cost
( )
*
1 1
, P P R is a linear function of
*
1
P .
Bayesian test minimizes the risk for
*
1
P .
For other values of
1
P

( ) ( )
*
1 1 1
, P P P R R

( )
1
P R is strictly concave. (If ( ) R is a
continuos random variable with strictly
R
R
F
R
B
C
11
C
00
P
1
*
1
P
1 0

A function of
1
P if
*
1
P is fixed
monotonic probability distribution function, the change of always
change the risk.)
Minimax test.
The Bayesian test designed to minimize the maximum possible risk is
called a minimax test.

1
P is chosen to maximize our risk
( )
*
1 1
, P P R .
To minimize the maximum risk we select the
*
1
P for which
( )
1
P R is
maximum.
If the maximum occurs inside the interval [ ] 0,1 , the
( )
*
1 1
, P P R will
become a horizontal line. Coefficient of
1
P must be zero.

( ) ( ) ( )
11 00 01 11 10 00
0
M F
C C C C P C C P + = This equation is
the minimax equation.
R
R
F
R
B
C
11
C
00
P
1
1 0
R
R
F
R
B
C
11
C
00
P
1
1 0
R
R
F
R
B
C
11
C
00
P
1
1 0
a) b) c)
Risk curves: maximum value of Rat a)
1
1 P = b)
1
0 P = c)
1
0 1 P
Special case.
Cost function is
00 11
0 C C = =
01 10
,
M F
C C C C = = .
The risk is
( ) ( )
1 1 0 1 F F M M F F F F M M
P C P P C P C P PC P PC P = + = + R .
The minimax equation is
M M F F
C P C P = .
Neyman-Pearson test.
Often it is difficult to assign realistic costs of a priori probabilities.
This can be bypassed if to work with the conditional probabilities
F
P
and
D
P .
We have two conflicting objectives to make
F
P as small as possible
and
D
P as large as possible.

Neyman-Pearson criterion.
Constrain '
F
P = and design a test to maximize
D
P (or minimize
M
P ) under this constraint.
The solution can be obtained by using Lagrange multipliers.
'
M F
F P P
l
= +
l

( ) ( )
1 0
0 0
1 0
r| r|
F R | R+ R | R '
H H
Z Z Z
p H d p H d

l
l
=
l
l
l


If
F
P = , minimizing F minimizes
M
P .
( )
( ) ( )
1 0
0
1 0
r| r|
F 1 ' R | R | R
H H
Z
p H p H d
l
= +
l
l


For any positive value of an LRT will minimize F.
F is minimized if we assign a point R to
0
Z only when the term in the
bracket is negative.
If
( )
( )
1
0
1
r|
0
r|
R |
R |
H
H
p H
p H
< assign point to
0
Z (say
0
H )

F is minimized by the likelihood ratio test. ( )
1
0
R
H
H

To satisfy the constraint is selected so that '
F
P = .

( )
0
0
|
| '
F H
P p H d

= =


Value of will be nonnegative because
( )
0
0
|
|
H
p H

will be zero for


negative values of .

( )
0
| 0 l H
p X H
( )
1
| 1 l H
p X H
L
( )
0
| 0 l H
p X H
( )
1
| 1 l H
p X H
L

Example.
We assume that under
1
H the source output is a constant voltage m.
Under
0
H the sourse output is zero. Voltage is corrupted by an additive
noise. The out put is sampled with N samples for each second. Each
noise sample is a i.i.d. zero mean Gaussian random variable with
variance
2
.
1
0
: , 1, 2, ,
: , 1, 2, ,
i i
i i
H r m n i N
H r n i N
= + =
= =


( )
2
2
1
exp
2 2
i
n
X
p X

1



( )

The probability density of
i
r under each hypothesis is:
( ) ( )
( )
1
2
2 1
|
1
| exp
2 2
i
i
i
n
i i
r H
R m
p R H p R m

1

= =



( )

( ) ( )
( )
0
2
2 0
|
1
| exp
2 2
i
i
i
n
i i
r H
R
p R H p R

1

= =



( )

The joint probability of N samples is:
( )
( )
1
2
2 1
|
1
1
| exp
2 2
N
i
H
i
R m
p H

=
1



( )

r
R
( )
0
2
2 0
|
1
1
| exp
2 2
N
i
H
i
R
p H

=
1


( )

r
R
The likelihood ratio is
( )
( )
2
2
1
2
2
1
1
exp
2 2
1
exp
2 2
N
i
i
N
i
i
R m
R


=
=
1



( )
=
1


( )

R
After cancelling common terms and taking logarithm:
( )
2
2 2
1
ln .
2
N
i
i
m Nm
R

=
=

R
Likelihood ratio test is:

1
0
2
2 2
1
ln
2
N
H
i
H
i
m Nm
R

=

,

1
0
2
1
ln
2
N
H
i
H
i
Nm
R
m


=
+

.
We use
Nm
d

for normalisation.

1
0
1
1
ln
2
N
H
i
H
i
Nm
l R
N Nm


=
= +


Under
0
H l is obtained by adding N independent zero mean
Gaussian variables with variance
2
and then dividing by N.
Therefore l is ( ) 0,1 N .
Under
1
H l is ,1
Nm
N



( )
.
( )
2
log / /2
1 ln
exp
2 2 2
F
d d
x d
P dx erfc
d

+
1
1

= = +


( )
( )


where
Nm
d

is the distance between the means of the two


densities.
( )
( )
( )
( )
2
log / /2
2
log / /2
1
exp
2 2
1 log
exp
2 2 2
D
d d
d d
x d
P dx
y d
dy erfc
d


( )
1
1

= =

( )
( )


In the communication systems a special case is important

( )
0 1
Pr
F M
PP PP + .
If
0 1
P P = the threshold is one and
( )
( )
1
2
Pr
F M
P P + .
Receiver Operating Characteristics (ROC).
For a Neyman-Pearson test the values of
F
P and
D
P completely
specify the test performance.

D
P depends on
F
P . The function of
( )
D F
P P is defined as the Receiver
Operating Characteristic (ROC).
The Receiver Operating Characteristic (ROC) completely describes
the performance of the test as a function of the parameters of
interest.




Example.

Properties of ROC.
All continuous likelihood tests have ROCs that are concave
downward.
All continuous likelihood ratio tests have ROCs that are above the
D F
P P = .
The slope of a curve in a ROC at a particular point is equal to the
value of the threshold required to achieve the
F
P and
D
P at that
point
Whenever the maximum value of the Bayes risk is interior to the
interval (0,1) of the P1 axis the minimax operating point is the
intersection of the line
( ) ( )( ) ( )
11 00 01 11 10 00
1 0
D F
C C C C P C C P + =
and the appropriate curve on the ROC.

Determination of minimax operating point.
Conclusions.
Using either the Bayes criterion of a Neyman-Pearson criterion, we
find that the optimum test is a likelihood ratio test.
Regardless of the dimension of the observation space the optimum
test consist of comparing a scalar variable ( ) R with the threshold.
For the binary hypothesis test the decision space is one dimensional.
The test can be simplified by calculating the sufficient statistics.
A complete description of the LRT performance was obtained by
plotting the conditioning probabilities
D
P and
F
P as the threshold
was varied.
M Hypotheses.
We choose one of M hypotheses
There are M source outputs each of which corresponds to one of M
hypotheses.
We are forsed to make decisions.
There are
2
M alternatives that may occur each time the experiment
is conducted.
Probabilistic
transition
mechanism
Source
N
H

1
H
0
H
Z: Observation space
R
R
Z
i
Z
0
Say:H
0
Z
1
Say:H
1
Say:H
i
( )
0
0 r|
R |
H
p H
( )
1
1 r|
R |
H
p H
( )
r|
R |
N
N H
p H
Bayes Criterion.
ij
C cost of each course of actions.
i
Z region in observation space where we chose
i
H
i
P a priori probabilities.

( )
1 1
|
0 0
|
j
i
M M
j ij j
H
i j
Z
PC p H d

= =
=


r
R R R
R is minimized through selecting
i
Z .

Example M = 3.
0 1 2
Z Z Z Z = + +
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )
0 0
1 2 1
0 1
2 0 2
1 1
0 2
2 2
0 1 0
2
1
0 00 0 0 10 0
| |
0 20 0 1 11 1
| |
1 01 1 1 21 1
| |
2 22 2 2 02 2
| |
2 12 2
|
| |
| + |
| |
| + |
|
H H
Z Z Z Z
H H
Z Z Z Z
H H
Z Z
H H
Z Z Z Z
H
Z
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d PC p H d
PC p H d



= +
+
+ +
+



r r
r r
r r
r r
r
R R R R
R R R R
R R R R
R R R R
R R
R


( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
2 1
0
0 2
1
0 1
2
0 00 1 11 2 22
2 02 22 2 1 01 11 1
| |
0 10 00 0 2 12 22 2
| |
0 20 00 0 1 21 11 1
| |
| |
| |
| |
H H
Z
H H
Z
H H
Z
PC PC PC
P C C p H P C C p H d
P C C p H P C C p H d
P C C p H P C C p H d
= + +
l
+ +
l
l
l
+ +
l
l
l
+ +
l
l

r r
r r
r r
R R R
R R R
R R R
R

R is minimized if we assign each R to the region in which the value
of the integrand is the smallest.
Label the integrals ( ) ( ) ( )
0 1 2
, , I I I R R R .
( ) ( ) ( )
0 1 2 0
and , choose I I I H < R R R
( ) ( ) ( )
1 0 2 1
and , choose I I I H < R R R
( ) ( ) ( )
2 0 1 2
and , choose I I I H < R R R
If we use likelihood ratios
( )
( )
( )
1
0
1
|
1
0
|
|
|
H
H
p H
p H

r
r
R
R
R

( )
( )
( )
2
0
2
2
0
|
|
H
H
p H
p H

r|
r|
R
R
R

The set of decision equations is:
( )
( )
( ) ( )
( )
( )
( )
( ) ( )
( )
( )
( )
( ) ( )
( )
0 2
1 2
0 1
2 1
0 1
0 2
or
0 00 1 01 11 1 10 2 12 02 2
or
or
0 00 2 02 22 2 20 1 21 01 1
or
or
0 2 12 22 2 20 10 1 21 11 1
or
H H
H H
H H
H H
H H
H H
P C C P C C P C C
P C C P C C P C C
P C C P C C P C C
+
+
+
R R
R R
R R


M hypotheses always lead to a decision space that has, at most,
1 M dimensions.
Decision Space
2
H
1
H
0
H
( )
1
R
( )
2
R

Special case. (often in communication)
00 11 22
0 C C C = = =
1,
ij
C i j =
( ) ( )
( ) ( )
( ) ( )
1 2
1 0
0 2
1 2
2 0
0 1
0 2
2 1
0 1
or
1 1 0 0
| |
or
or
2 2 0 0
| |
or
or
2 2 0 1
| |
or
| |
| |
| |
H H
H H
H H
H H
H H
H H
H H
H H
H H
Pp H P p H
P p H P p H
P p H P p H
r r
r r
r r
R R
R R
R R


Compute the posterior probabilities and choose the largest.
Maximum a posteriori probability computer.
Special case.
Degeneration of hypothesis.
What happens if to combine
1
H and
2
H :
12 21
0 C C = = .
For simplicity
01 10 20 02
C C C C = = =
00 11 22
0 C C C = = =
First two equations of the test reduce to
( ) ( )
1 2
0
or
1 1 2 2 0
H H
H
P P P + R R

1 2
or H H
0
H
( )
2
R
( )
1
R

Dummy hypothesis.
Actual problem has two hypothesis
1
H and
2
H .
We introduce a new one
0
H with a priori probability
0
0 P =
Let

1 2
1 P P + = and
12 02 21 01
, C C C C = =
We always choose
1
H or
2
H . The test reduces to:
( )
( )
( )
( )
2
1
2 12 22 2 1 21 11 1
H
H
P C C P C C R R
Useful if
( )
( )
2
1
2
|
1
|
|
|
H
H
p H
p H
r
r
R
R
difficult to work with but ( )
1
R , ( )
2
R are
simple.
Conclusions.
1. The minimum dimension of the decision space is no more that
1 M . The boundaries of the decision regions are hyperplanes in the
( )
1 1
, ,
m
plane.
2. The test is simple to find but error probabilities are often difficult to
compute.
3. An important test is the minimum total probability of error test.
Here we compute the a posteriori probability of each test and choose
the largest.

Вам также может понравиться