Вы находитесь на странице: 1из 55

MAP Classier for Normal Distributions

Performance of the Bayes Classier


Error bounds
SYDE 372 - Winter 2011
Introduction to Pattern Recognition
Probability Measures for Classication:
Part II
Alexander Wong
Department of Systems Design Engineering
University of Waterloo
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Outline
1
MAP Classier for Normal Distributions
2
Performance of the Bayes Classier
3
Error bounds
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
By far the most popular conditional class distribution model
is the Gaussian distribution:
p(x|A) = N(
A
,
2
A
) =
1

2
A
exp
_

1
2
(
x
A

A
)
2
_
(1)
and p(x|B) = N(
B
,
2
B
).
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
For the two-class case where both distributions are
Gaussian, the following MAP classier can be dened as:
N(
A
,
2
A
)
N(
B
,
2
B
)
A
>
<
B
P(B)
P(A)
(2)
exp
_

1
2
(
x
A

A
)
2
_
exp
_

1
2
(
x
B

B
)
2
_
A
>
<
B

A
P(B)

B
P(A)
(3)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
In log-likelihood form:
exp
_

1
2
(
x
A

A
)
2
_
exp
_

1
2
(
x
A

A
)
2
_
A
>
<
B

A
P(B)

B
P(A)
(4)
_

1
2
(
x
A

A
)
2
_

1
2
(
x
B

B
)
2
_
A
>
<
B
ln[
A
P(B)] ln[
B
P(A)]
(5)
_
(
x
B

B
)
2
_

_
(
x
A

A
)
2
_
A
>
<
B
2[ln[
A
P(B)] ln[
B
P(A)]] (6)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Giving us the nal form:
_
(
x
B

B
)
2
_

_
(
x
A

A
)
2
_
A
>
<
B
2[ln[
A
P(B)] ln[
B
P(A)]]
(7)
Does this look familiar?
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
The decision boundary (threshold) for the MAP classier
where P(x|A) and P(x|B) are Gaussian distributions can
be nd by solving the following expression for x:
_
(
x
B

B
)
2
_

_
(
x
A

A
)
2
_
= 2[ln[
A
P(B)] ln[
B
P(A)]] (8)
x
2
_
1

2
B

2
A
_
2x
_

2
B

2
A
_
+

2
B

2
B

2
A

2
A
= 2 ln
_

A
P(B)

B
P(A)
_
(9)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
For case where
A
=
B
, P(A) = P(B) =
1
2
:
x
2
_
1

2
B

2
A
_
2x
_

2
B

2
A
_
+

2
B

2
B

2
A

2
A
= 2 ln
_

A
P(B)

B
P(A)
_
(10)
x
2
(
2
A

2
B
) 2x(
B

2
A

2
B
) + (
2
B

2
A

2
A

2
B
) = 2 ln[1]
(11)
Since ln(1) = 0 and
A
=
B
,
x =
(
2
B

2
A

2
A

2
A
)
2(
B

2
A

2
A
)
(12)
x =
(
2
B

2
A
)
2(
B

A
)
(13)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Since (a
2
b
2
) = (a b)(a + b):
x =
(
B

A
)(
B
+
A
)
2(
B

A
)
(14)
x =
(
B
+
A
)
2
(15)
Therefore, for the case of equally likely, equi-variance
classes, the MAP rule reduces to a threshold midway
between the means.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
For case where P(A) = P(B) and
A
=
B
, the threshold
shifts and a second threshold appears as the second
solution to the quadratic expression.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Example of a 1-D case:
Suppose that, given a pattern x, we wish to classify it as
one of two classes: class A and class B.
Suppose the two classes have patterns x which are
normally distributed as follows:
p(x|A) = N(
A
,
2
A
) =
1

2
A
exp
_

1
2
(
x
A

A
)
2
_
(16)
p(x|B) = N(
B
,
2
B
) =
1

2
B
exp
_

1
2
(
x
B

B
)
2
_
(17)

A
= 130,
B
= 150.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Question: If we know that in a previous case that 4
patterns belong to class A and 6 patterns belong to class
B, and both classes have the same standard deviation of
20, what is the MAP classier?
For the two-class case where both distributions are
Gaussian, the following MAP classier can be dened as:
N(
A
,
2
A
)
N(
B
,
2
B
)
A
>
<
B
P(B)
P(A)
(18)
exp
_

1
2
(
x
A

A
)
2
_
exp
_

1
2
(
x
B

B
)
2
_
A
>
<
B

A
P(B)

B
P(A)
(19)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Plugging in
A
,
B
, and
A
=
B
= :
exp
_

1
2
(
x130
20
)
2

exp
_

1
2
(
x150
20
)
2

A
>
<
B
P(B)
P(A)
(20)
Taking the log:
_

1
2
(x 130)
2
_

1
2
(x 150)
2
_
A
>
<
B
2(20
2
) ln[P(B)]ln[P(A)]
(21)
_
(x 150)
2
_

_
(x 130)
2
_
A
>
<
B
800[ln[P(B)] ln[P(A)]] (22)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
The prior probability P(A) and P(B) can be determined as:
P(A) = 4/(6 + 4) = 0.4 P(B) = 6/(6 + 4) = 0.6 (23)
Plugging in P(A) and P(B):
_
(x 150)
2
_

_
(x 130)
2
_
A
>
<
B
800[ln[0.6/0.4]] (24)
_
(x 150)
2
_

_
(x 130)
2
_
A
>
<
B
800[ln[1.5]] (25)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Expanding and simplifying:
_
(x 150)
2
_

_
(x 130)
2
_
A
>
<
B
800[ln[1.5]] (26)
(x
2
300x +(150)
2
) (x
2
260x +(130)
2
)
A
>
<
B
800[ln[1.5]]
(27)
40x
A
>
<
B
800[ln[1.5]] (150)
2
+ (130)
2
(28)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Expanding and simplifying:
40x
A
>
<
B
800[ln[1.5]] (150)
2
+ (130)
2
(29)
x
B
>
<
A
800[ln[1.5]] 5600
40
(30)
x
B
>
<
A
131.9 (31)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
For the n-d case, where p(x|A) = N(
A
,
2
A
) and
p(x|B) = N(
B
,
2
B
),
P(A) exp

1
2
(x
A
)
T

1
A
(x
A
)

(2)
n
2
|
A
|
1/2
A
>
<
B
P(B) exp

1
2
(x
B
)
T

1
B
(x
B
)

(2)
n
2
|
B
|
1/2
(32)
exp
_

1
2
(x
A
)
T

1
A
(x
A
)
_
exp
_

1
2
(x
B
)
T

1
B
(x
B
)
_
A
>
<
B
|
A
|
1/2
P(B)
|
B
|
1/2
P(A)
(33)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier for Normal Distributions
Taking the log and simplifying:
(x
B
)
T

1
B
(x
B
)(x
A
)
T

1
A
(x
A
)
A
>
<
B
2 ln
_
|
A
|
1/2
P(B)
|
B
|
1/2
P(A)
_
(34)
(x
B
)
T

1
B
(x
B
)(x
A
)
T

1
A
(x
A
)
A
>
<
B
2 ln
_
P(B)
P(A)
_
+ln
_
|
A
|
|
B
|
_
(35)
Looks familiar?
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Decision Boundaries for Normal Distribution
What is the MAP decision boundaries if our classes can be
characterized by normal distributions?
x
T
Q
0
x + Q
1
x + Q
2
+ 2Q
3
+ Q
4
= 0, (36)
where,
Q
0
= S
1
A
S
1
B
(37)
Q
1
= 2[m
T
B
S
1
B
m
T
A
S
1
A
] (38)
Q
2
= m
T
A
S
1
A
m
A
m
T
B
S
1
B
m
B
(39)
Q
3
= ln
_
P(B)
P(A)
_
(40)
Q
4
= ln
_
|S
A
|
|S
B
|
_
(41)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier: Example
Suppose we are given the following statistical information
about the classes:
Class A: m
A
= [0 0]
T
, S
A
=
_
4 0
0 4
_
, P(A)=0.6.
Class B: m
B
= [0 0]
T
, S
B
=
_
1 0
0 1
_
, P(B)=0.4.
Suppose we wish to build a MAP classier.
Compute the decision boundary.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier: Example
Step 1: Compute S
A
1
and S
B
2
:
S
A
1
=
_
1/4 0
0 1/4
_
S
B
1
=
_
1 0
0 1
_
(42)
Step 2: Compute Q
0
, Q
1
, Q
2
, Q
3
:
Q
0
= S
1
A
S
1
B
=
_
1/4 0
0 1/4
_

_
1 0
0 1
_
=
_
3/4 0
0 3/4
_
(43)
Q
1
= 2[m
T
B
S
1
B
m
T
A
S
1
A
] = 0 (44)
Q
2
= m
T
A
S
1
A
m
A
m
T
B
S
1
B
m
B
= 0 (45)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier: Example
Step 2: Compute Q
0
, Q
1
, Q
2
, Q
3
:
Q
3
= ln
_
P(B)
P(A)
_
= ln
_
0.4
0.6
_
= ln(4/6) (46)
Q
4
= ln
_
|S
A
|
|S
B
|
_
= ln
_
(1/4)(1/4) (0)(0)
(1)(1) (0)(0)
_
= ln(1/16).
(47)
Step 3: Plugging in Q
0
, Q
1
, Q
2
, Q
3
gives us:
x
T
Q
0
x + Q
1
x + Q
2
+ 2Q
3
+ Q
4
= 0, (48)
x
T
_
3/4 0
0 3/4
_
x 2 ln(4/6) ln(1/16) = 0, (49)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
MAP Classier: Example
Simplifying gives us:
([x
1
x
2
]
T
)
T
_
3/4 0
0 3/4
_
[x
1
x
2
]
T
2 ln(4/6)ln(1/16) = 0,
(50)
[ 3/4x
1
3/4x
2
][x
1
x
2
]
T
+549/677+2731/985 = 0, (51)
3/4x
2
1
3/4x
2
2
+ 1609/449 = 0, (52)
The nal MAP decision boundary is:
x
2
1
+ x
2
2
= 2131/446, (53)
This is just a circle centered at (x
1
, x
2
) = (0, 0) with a
radius of 2011/920.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Relationship between MICD and MAP Classiers for Normal
Distributions
You will notice that the terms on the right has the same
form as the MICD distance metric!
(x
B
)
T

1
B
(x
B
)(x
A
)
T

1
A
(x
A
)
A
>
<
B
2 ln
_
P(B)
P(A)
_
+ln
_
|
A
|
|
B
|
_
(54)
d
2
MICD
(x,
B
,
B
) d
2
MICD
(x,
A
,
A
)
A
>
<
B
2 ln
_
P(B)
P(A)
_
+ ln
_
|
A
|
|
B
|
_
(55)
If 2 ln
_
P(B)
P(A)
_
+ ln
_
|
A
|
|
B
|
_
= 0, then the MAP classier
becomes just the MICD classier!
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Relationship between MICD and MAP Classiers for Normal
Distributions
Therefore, the MICD is only optimal in terms of probability
of error only if we have multivariate Normal distributions
N(, ) that have:
Equal a priori probabilities (P(A) = P(B))
Equal volume cases (|
A
| = |
B
|)
If that is the case, whats so special about
2 ln
_
P(B)
P(A)
_
+ ln
_
|
A
|
|
B
|
_
?
First term 2 ln
_
P(B)
P(A)
_
biases decision in favor of more likely
class according to a priori probabilities
Second term ln
_
|
A
|
|
B
|
_
biases decision in favor of class with
smaller volume (||)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Relationship between MICD and MAP Classiers for Normal
Distributions
So under what circumstance does MAP classier perform
better than MICD?
Recall the case where we have only one feature (n = 1),
m = 0, and s
A
= s
B
.
The MICD classication rule for this case is:
(1/s
2
B
1/s
2
A
)x
2
> 0 (56)
(1/s
2
A
)x
2
< (1/s
2
B
)x
2
(57)
s
2
A
> s
2
B
(58)
The MICD classication rule decides in favor of the class
with the largest variance, regardless of x
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Relationship between MICD and MAP Classiers for Normal
Distributions
The MAP classication rule for this case is:
(1/s
2
B
1/s
2
A
)x
2
> 2 ln
_
P(A)
P(B)
_
+ ln
_
s
2
A
s
2
B
_
(59)
If P(A) = P(B)
(1/s
2
B
1/s
2
A
)x
2
> ln
_
s
2
A
s
2
B
_
(60)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Relationship between MICD and MAP Classiers for Normal
Distributions
Looking at the MAP classication rule:
(1/s
2
B
1/s
2
A
)x
2
> ln
_
s
2
A
s
2
B
_
(61)
At the mean m = 0,
0 > ln
_
s
2
A
s
2
B
_
(62)
if s
2
A
< s
2
B
, the log term is negative and favors class A
if s
2
B
< s
2
A
, the log term is positive and favors class B
Therefore, the MAP classication rule decides in favor of
class with the lowest variance close to the mean, and
favors the class with highest variance beyond a certain
point in both directions.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Performance of the Bayes Classier
How do we quantify how well the Bayes classier works?
Since the Bayes classier minimizes the probability of
error, one way to analyze how well it does is to compute
the probability of error P() itself.
Allows us to see the theoretical limit on the expected
performance, under the assumption of known probability
density functions.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Probability of error given pattern
For any pattern x such that P(A|x) > P(B|x):
x is classied as part of class A
The probability of error of classifying x as A is P(B|x)
Therefore, naturally, for any given x the probability of error
P(|x) is:
P(|x) = min[P(A|x), P(B|x)] (63)
Rationale: Since we always chose the maximum posterior
probability as our class, the minimum posterior probability
would be the probability of choosing incorrectly.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Probability of error given pattern
Recall our previous example of a 1-D case:
p(x|A) = N(
A
,
2
A
) =
1

2
A
exp
_

1
2
(
x
A

A
)
2
_
(64)
p(x|B) = N(
B
,
2
B
) =
1

2
B
exp
_

1
2
(
x
B

B
)
2
_
(65)

A
= 130,
B
= 150, P(A) = 0.4, P(B) = 0.6,
A
=
B
= 20.
For x = 140, what is the probability of error P(|x)?
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Probability of error given pattern
Recall the MAP classier for this scenario:
x
B
>
<
A
131.9 (66)
Based on this MAP classier, the pattern x = 140 belongs
to class A.
Given the probability of error P(|x) is:
P(|x) = min[P(A|x), P(B|x)] (67)
Since B gives the maximum probability, the minimum
probability would be P(A|x).
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Probability of error given pattern
Therefore, P(|x) for x = 140 is:
P(|x)|
x=140
= P(A|x)|
x=140
=
P(x|A)P(A)
P(x|A)P(A) + P(x|B)P(B)
|
x=140
(68)
P(|x)|
x=140
=
26/1477(0.4)
(26/1477)0.4 + (26/1477)(0.6)
(69)
P(|x)|
x=140
= 0.4. (70)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Now that we know the probability of error for a given x,
denoted as P(|x), the expected probability of error P()
can be found as:
P() =
_
P(|x)p(x)dx (71)
P() =
_
min[P(A|x), P(B|x)] p(x)dx (72)
In terms of class PDFs:
P() =
_
min[P(x|A)P(A), P(x|B)P(B)]dx (73)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Now if we were to dene decision regions R
A
and R
B
:
R
A
= x such that P(A|x) > P(B|x)
R
B
= x such that P(B|x) > P(A|x)
The expected probability of error can be dened as:
P() =
_
R
A
P(x|B)P(B)dx +
_
R
B
P(x|A)P(A)dx (74)
Rationale: For all patterns in R
A
, the probability of A will be
the maximum between A and B, so the probability of error
of patterns in R
A
is just the minimum probability (in this
case, the probability of B), and vice versa.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Example 1: univariate Normal, equal variance, equally
likely two class problem:
n = 1, P(A) = P(B) = 0.5,
A
=
B
= ,
A
<
B
Likelihood:
p(x|A) = N(
A
,
2
A
) =
1

2
A
exp
_

1
2
(
x
A

A
)
2
_
(75)
p(x|B) = N(
B
,
2
B
) =
1

2
B
exp
_

1
2
(
x
B

B
)
2
_
(76)
Find p()
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Recall for the case of equally likely, equi-variance classes,
the MAP decision boundary reduces to a threshold midway
between the means.
x =
(
B
+
A
)
2
(77)
Since
A
<
B
, this gives us the following decision regions
R
A
and R
B
:
R
A
= x such that x <
(
B
+
A
)
2
R
B
= x such that x >
(
B
+
A
)
2
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Based on decision regions R
A
, R
B
, P(A), P(B), P(x|A),
P(x|B),
B
,
A
, the expected probability of error P()
becomes
P() =
_
R
A
P(B)P(x|B)dx +
_
R
B
P(A)P(x|A)dx (78)
P() =
1
2
_
(
B
+
A
)
2

P(x|B)dx +
1
2
_

(
B
+
A
)
2
P(x|A)dx (79)
P() =
1
2
_
(
B
+
A
)
2

N(
B
,
2
)dx +
1
2
_

(
B
+
A
)
2
N(
A
,
2
)dx
(80)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Since the two classes are symmetric (P(|A) = P(|B)),
P() =
1
2
_
(
B
+
A
)
2

N(
B
,
2
)dx +
1
2
_

(
B
+
A
)
2
N(
A
,
2
)dx
(81)
P() =
_

(
B
+
A
)
2
N(
A
,
2
)dx (82)
P() =
_

(
B
+
A
)
2
1

2
A
exp
_

1
2
(
x
A

A
)
2
_
dx (83)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Doing a change of variables, where y =
x
A

, dx = dy,
P() =
_

(
B

A
)
2
1

2
exp
_

1
2
y
2
_
dy (84)
This corresponds to an integral over a normalized (N(0, 1))
Normal random variable:
Q() =
_

2
exp
_

1
2
y
2
_
dy (85)
Plugging Q in gives us the nal expected probability of
error P():
P() = Q(

A
2
) (86)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Visualization of P():
P() is essentially the shaded area.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Observations:
As the distance between the means increase, the shaded
area becomes monotonically smaller and the expected
probability of error P() monotonically decreases.
At = 0,
A
=
B
= 0 and P() = 1/2 (makes sense since
the distributions completely overlap, and you have a 50/50
chance of either class)
lim

P() = 0.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
For cases where P(A) = P(B) or
A
=
B
, the decision
boundary change AND an additional boundary is
introduced!
Luckily, P() can still be expressed using the Q() function
with appropriate change of variables.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Example:
P() is essentially the shaded area.
P() = P(A)Q(
1
) + P(B)[Q(
3
) Q(
4
)] + P(A)Q(
2
)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Lets take a look at the multivariate case (n>1)
For p(x|A) = N(
A
, ), p(x|B) = N(
B
, ), P(A) = P(B),
it can be shown that:
P() = Q(d
M
(
A
,
B
)/2) (87)
where d
M
(
A
,
B
) is the Mahalanobis distance between the
classes.
d
M
(
A
,
B
) = [(
A

B
)
T

1
(
A

B
)]
1/2
(88)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Why is P() like that for this case?
Remember that for all cases where the covariance matrices
AND the prior probabilities are the same, the decision
boundary between the classes is always a straight line in
hyperspace that is:
sloped based on (since our orthonormal whitening
transform is identical for both classes)
intersects with the midpoint of the line segment between
mu
A
and mu
B
The probability of error is just the area under P(x|A)p(A) on
the class B side of this decision boundary PLUS the area
under P(x|B)p(B) on the class A side of this decision
boundary.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Example of non-Gaussian density functions:
Suppose two classes have density functions and a priori
probabilities:
p(x|C
1
) =
_
ce
x
0 x 1
0 else
(89)
p(x|C
2
) =
_
ce
(1x)
0 x 1
0 else
(90)
P(C
1
) = P(C
2
) =
1
2
(91)
where c =

1e

is just the appropriate constant to


normalize the PDF.
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
Therefore, the expected probability of error is:
P() =
_
min[P(x|C
1
)P(C
1
), P(x|C
2
)P(C
2
)]dx (92)
P() =
_
R
C
1
P(x|C
2
)P(C
2
)dx+
_
R
C
2
P(x|C
1
)P(C
1
)dx (93)
P() =
_
0.5
0
0.5P(x|C
2
)dx +
_
1.0
0.5
0.5P(x|C
1
)dx (94)
Because of symmetry between the two classes
(P(|C
1
) = P(|C
2
)),
P() =
_
1.0
0.5
ce
x
dx (95)
P() =
c

_
e
/2
e

_
(96)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Expected probability of error
(b) Find P(|x):
From the decision boundary and decision regions we
determined in (a),
p(|x) =
_
P(C
2
|x) 0 x 1/2
P(C
1
|x) 1/2 x 1
(97)
p(|x) =
_
P(x|C
2
)P(C
2
)
P(x)
0 x 1/2
P(x|C
1
)P(C
1
)
P(x)
1/2 x 1
(98)
p(|x) =
_
e
x
0.5
e
x
+e
(1x)
0 x 1/2
e
(1x)
0.5
e
x
+e
(1x)
1/2 x 1
(99)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Error bounds
In practice, the exact P() is only easy to compute for
simple cases as shown before.
So how can we quantify the probability of error in such
cases?
Instead of nding the exact P(), we determine the
bounds on P(), which are:
Easier to compute
Leads to estimates of classier performance
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Bhattacharrya bound
Using the following inequality:
min[a, b]
_
(a, b) (100)
The following holds true:
P() =
_
min[P(x|A)P(A), P(x|B)P(B)]dx (101)
P()
_
P(A)P(B)
_
_
P(x|A)P(x|B)dx (102)
Whats so special about this?
Answer: You dont need the actual decision regions to
compute this!
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Bhattacharrya bound
Since P(A) + P(B) = 1 and the Bhattacharrya coefcient
can be dened as:
=
_
_
P(x|A)P(x|B)dx (103)
The upper bound (Bhattacharrya bound) of P() can be
written as
P()
1
2
(104)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Bhattacharrya bound: Example
Example: Consider a classier for a two class problem.
Both classes are multivariate normal. When both classes
are a priori equally likely, the Bhattacharrya bound is
P() 0.3.
New information is specied, such that we are told that the
a priori probabilities of the two classes are 0.2 and 0.8, for
A and B respectively.
What is the new upper bound for the probability of error?
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Bhattacharrya bound Example
Step 1: Based on old bound, compute the Bhattacharrya
coefcient
P() = 0.3
_
P(A)P(B)
_
_
P(x|A)P(x|B)dx (105)
0.3
_
P(A)P(B)

_
_
P(x|A)P(x|B)dx (106)
=
_
_
P(x|A)P(x|B)dx
0.3

0.5 0.5
= 0.6 (107)
Alexander Wong SYDE 372 - Winter 2011
MAP Classier for Normal Distributions
Performance of the Bayes Classier
Error bounds
Bhattacharrya bound Example
Step 2: Based on Bhattacharrya coefcient and new
priors P(A) = 0.2 and P(B) = 0.8, the new upper bound
can be computed as:
P()
_
P(A)P(B)
_
_
P(x|A)P(x|B)dx (108)
P()

0.8 0.2 (109)


P()

0.8 0.2 0.6 = 0.24 (110)


Alexander Wong SYDE 372 - Winter 2011

Вам также может понравиться