Materi 3 Bayesian Decision Theory

Bayesian
Decision Theory
Team teaching
Lecture 2:
Bayesian Decision Theory 1. Diagram and formula=on 2. Bayes rule for inference 3. Bayesian decision 4. Discriminant func=ons and space par==on 5. Advanced issues
Lecture note for Stat 231: Pa;ern Recogni=on and Machine Learning
Diagram of pa;ern classica=on

Procedure of pattern recognition and decision making
subjects Observables X Features x Inner belief w Ac=on

X--- all the observables using existing sensors and instruments x --- is a set of features selected from components of X, or linear/non-linear functions of X. w --- is our inner belief/perception about the subject class. --- is the action that we take for x. We denote the three spaces by
x d , w C ,
x = ( x1 , x2 ,..., xd ) is a vector w is the index of class, C = {w1 , w2 ,..., wk }

Examples
Ex 1: Fish classica=on X=I is the image of sh,
x =(brightness, length, n#, .) w is our belief what the sh type is c={sea bass, salmon, trout, } is a decision for the sh type, in this case c=
={sea bass, salmon, trout, }
Ex 2: Medical diagnosis
X= all the available medical tests, imaging scans that a doctor can order for a patient x =(blood pressure, glucose level, cough, x-ray.) w is an illness type c={Flu, cold, TB, pneumonia, lung cancer} is a decision for treatment,
={Tylenol, Hospitalize, }

Tasks
subjects
Observables X
Features x
Inner belief w
Decision

control sensors
selec=ng Informa=ve features
sta=s=cal inference
risk/cost minimiza=on
In Bayesian decision theory, we are concerned with the last three steps in the big ellipse assuming that the observables are given and features are selected.
Bayes Decision
It is the decision making when all underlying probability distribu=ons are known. It is op=mal given the distribu=ons are known. For two classes 1 and 2 , Prior probabilities for an unknown new observation: P(1) : the new observation belongs to class 1 P(2) : the new observation belongs to class 2 P(1 ) + P(2 ) = 1 It reflects our prior knowledge. It is our decision rule when no feature on the new object is available: Classify as class 1 if P(1 ) > P(2 )
Bayesian Decision Theory

Features x
sta=s=cal Inference
Inner belief p(w|x)
risk/cost minimiza=on
Decision (x)

Two probability tables: a). Prior p(w) b). Likelihood p(x|w)
A risk/cost function (is a two-way table) ( | w)
The belief on the class w is computed by the Bayes rule p( x | w) p( w) p ( w | x) = p ( x) The risk is computed by k R( i | x) = ( i | w j )p(w j | x)
j=1
Bayes Decision
We observe features on each object. P(x| 1) & P(x| 2) : class-specific density The Bayes rule:
Decision Rule
A decision rule is a mapping function from feature space to the set of actions
(x) : d
we will show that randomized decisions wont be optimal. A decision is made to minimize the average cost / risk, R = R( ( x) | x) p( x) dx It is minimized when our decision is made to minimize the cost / risk for each instance x.
k
( x) = arg min R( | x) = arg min ( | w j ) p(w j | x)

j =1
Bayesian error
In a special case, like fish classification, the action is classification, we assume a 0/1 error.
( i | w j ) = 0 ( i | w j ) = 1
The risk for classifying x to class i is,
if i = w j if i w j
R( i | x) =
wj
p(w
i
| x) = 1 p( i | x)
The optimal decision is to choose the class that has maximum posterior probability
( x) = arg min (1 p( | x)) = arg max p( | x)

The total risk for a decision rule, in this case, is called the Bayesian error
R = p(error) = p(error | x) p( x)dx = (1 p( ( x) | x)) p( x)dx

Par==on of feature space

(x) : d
The decision is a par==on /coloring of the feature space into k subspaces
= ik=1 i
3
i j = , i j
5
2 1 4
An example of sh classica=on
Decision/classica=on Boundaries
Close-form solu=ons
In two case classica=on, k=2, with p(x|w) being Normal densi=es, the decision Boundaries can be computed in close-form.
Excercise
Consider the example of sea bass salmon classier, let these two possible ac=ons: A1: Decide the input is sea bass; A2: Decide the input is salmon. Prior for sea bass and salmon are 2/3 and 1/3, respec=vely. The cost of classifying a sh as a salmon when it truly is sea bass is 2$, and The cost of classifying a sh as a sea bass when it is truly a salmon is 1$. Find the decision for input X = 13, whereas the likelihood P(X|1) = 0.28, and P(X|2) = 0.17
Nominal features
Training data. Three input dimensions rash (R), temperature (T ), dizzy (D); output class (C). All variables binary

Training Phase : We can summarise the training data as follows, and es=mate the likelihoods as rela=ve frequencies
Classify the test data by compu=ng the posterior probabili=es. If P(C = 1 | X) > 0.5 classify as C = 1 else classify as C = 0 (since it is a two class problem)
Testing phase :
Univariate Normal Distribu=on (Con=nuous)
Training phase
compute the standard devia=on of both trolls and smurfs
Finally we compute the class condi=onal probability of observing a height given either a smurf or a troll
Finally, we compute the prior probability of observing a troll or a smurf regardless of height as
Tes.ng phase:
In general, we have the most important parts done. However, lets take a look at Bayes rule in this context:
We no=ce that in order to gure out the nal probability, we need the marginal probability p(x) which is the base probability of height. Basically, p(x) is there to make sure that if we summed up all the P(j|x) they would equal one. This is simply
which would be:
Finally.... We have everything we need to determine if a creature is most likely a troll or a smurf based upon its height. Thus, we can now ask, if we observe a creature that is 2 tall how likely is it to be a smurf?
Next we ask what is the likelihood of that we have a troll given the observation of 2?
The end result is that if
then we can decide that we are most likely observing a smurf
Exercise
if we test a creature that is 2.90 tall, how likely is it to be? Smurf or Troll?
Height 2.70 2.52 2.57 2.22 3.16 3.58 3.16
Creature Smurf Smurf Smurf Smurf Troll Troll Troll
The Mul.variate Normal Distribu.on (Con.nuous)
In two dimensions, the so called bivariate normal, we are concerned with two means: a variance-covariance matrix:
Curvature 2.95 2.53 3.57 3.57 3.16 2.58 2.16
Diameter 6.63 7.79 5.65 5.45 4.46 6.22 3.52
Quality Control Result Passed Passed Passed Passed Not passed Not passed Not passed
As a consultant to the factory, you get a task to set up the criteria for automa=c quality control. Then, the manager of the factory also wants to test your criteria upon new type of chip rings that even the human experts are argued to each other. The new chip rings have curvature 2.81 and diameter 5.46. Can you solve this problem by employing Bayes Classier?
X = features (or independent variables) of all data. Each row (denoted by ) represents one object; each column stands for one feature. Y = group of the object (or dependent variable) of all data. Each row represents one object and it has only one column.
Training phase
"2.95 6.63 % $ ' $2.35 7.79 ' $3.57 5.65 ' x= $ 3.16 5.47 ' y= $ ' $2.58 4.46 ' $ ' 2.16 6.22 ' $ $3.27 3.52 ' # &
"1 % $ ' $1 ' $1 ' $ ' $1 ' $2 ' $ ' 2' $ $2 ' # &
Xk = data of row k, for example x3 = 3.57 5.65 g=number of gropus in y, in our example, g=2 Xi = features data for group i . Each row represents one object; each column stands for one feature. We separate x nto several i groups based on the number of category in y.
"2.95 6.63 % $ ' 2.53 7.79 ' $ x = x1= 2
$3.57 5.65 ' $ ' #3.16 5.47 &
"2.58 4.46% $ ' 2.16 6.22 ' $ $3.27 3.52' # &
i = mean of features in group i, which is average of xi 1 = [ 3.05 ] , 2 = [2.67 4.73] 6.38 = global mean vector, that is mean of the whole data set. In this example, = [ 2.88 5.676]
x = mean corrected data, that is the features data for group i, xi , minus the global mean vector #0.305 1.218& #0.060 0.951 & 0 % ( x0 x 1 = % 2.109 2 = %0.732 0.547 ( 0.357 (
%0.679 % $0.269 0.025 ( ( 0.209 '
0 i
% %0.386 $
( 2.155( '
Covariance matrix of group i = 0 T 0 (xi ) xi i = ni #0.166 0.192& 1 = 2 = % ( $0.192 1.349 '
#0.259 0.286& % ( $0.286 2.142 '
Finally we compute the class condi=onal probability of observing curvature = 2.81 and diameter = 5.46 given either a passed or not passed
p(2.81,5.46|Passed)= 2(3.14)
1/2(2)
" 0.166 0.192 % $ ' e # 0.192 1.349 &

1/2
1/2
" 2.81 % " 3.05 % " 0.166 0.192 %1 " 2.81 % " 3.05 % 1/2($ '$ ')'$ ' ($ '$ ') # 5.46 & # 6.38 & # 0.192 1.349 & # 5.46 & # 6.38 &
" 0.259 0.286 % p(2.81,5.46| Not_ passed)=2(3.14)1/2(2) $ ' e # 0.286 2.142 &
" 2.81 % " 2.67 % " 0.259 0.286 %1 " 2.81 % " 2.67 % 1/2($ '$ ')'$ ' ($ '$ ') # 5.46 & # 4.73 & # 0.286 2.142 & # 5.46 & # 4.73 &
P = prior probability vector (each row represent prior probability of group ). If we do not know the prior probability, we just assume it is equal to total sample of each group divided by the total samples, that is
p(Passed) = 4/7 p(Not_passed) = 3/7
Tes.ng phase
In general, we have the most important parts done. However, lets take a look at Bayes rule in this context:
Basically, p(x) is there to make sure that if we summed up all the P(j|x) they would equal one. This is simply
Finally.... We have everything we need to determine if a creature is most likely a troll or a smurf based upon its height. Thus, we can now ask, if we observe curvature = 2.81 and diameter = 5.46 how likely is it to be a passed? p(2.81,5.46 | Passed)p(Passed) p(Passed | 2.81,5.46) = p(2.81,5.46)
Next we ask what is the likelihood of that we have not passed by given the observation of curvature = 2.81 and diameter = 5.46?
p(2.81,5.46 | Not _ passed)p(Not _ Passed) p(Not _ passed | 2.81,5.46) = p(2.81,5.46)

If the end result is
p(Not _ passed | 2.81,5.46) > p(Passed | 2.81,5.46)

then we can decide that is most likely classified as Not Passed

Materi 3 Bayesian Decision Theory

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Materi 3 Bayesian Decision Theory

Загружено:

Авторское право:

Доступные форматы

Bayesian

Diagram of pa;ern classica=on

x = ( x1 , x2 ,..., xd ) is a vector w is the index of class, C = {w1 , w2 ,..., wk }

selec=ng Informa=ve features

Bayesian Decision Theory

Inner belief p(w|x)

Two probability tables: a). Prior p(w) b). Likelihood p(x|w)

A risk/cost function (is a two-way table) ( | w)

( x) = arg min R( | x) = arg min ( | w j ) p(w j | x)

( x) = arg min (1 p( | x)) = arg max p( | x)

R = p(error) = p(error | x) p( x)dx = (1 p( ( x) | x)) p( x)dx

Par==on of feature space

Univariate Normal Distribu=on (Con=nuous)

compute the standard devia=on of both trolls and smurfs

which would be:

The end result is that if

then we can decide that we are most likely observing a smurf

Height 2.70 2.52 2.57 2.22 3.16 3.58 3.16

Creature Smurf Smurf Smurf Smurf Troll Troll Troll

The Mul.variate Normal Distribu.on (Con.nuous)

Curvature 2.95 2.53 3.57 3.57 3.16 2.58 2.16

Diameter 6.63 7.79 5.65 5.45 4.46 6.22 3.52

"2.95 6.63 % $ ' 2.53 7.79 ' $ x = x1= 2

$3.57 5.65 ' $ ' #3.16 5.47 &

"2.58 4.46% $ ' 2.16 6.22 ' $ $3.27 3.52' # &

Covariance matrix of group i = 0 T 0 (xi ) xi i = ni #0.166 0.192& 1 = 2 = % ( $0.192 1.349 '

#0.259 0.286& % ( $0.286 2.142 '

" 0.166 0.192 % $ ' e # 0.192 1.349 &

p(Passed) = 4/7 p(Not_passed) = 3/7

p(2.81,5.46 | Not _ passed)p(Not _ Passed) p(Not _ passed | 2.81,5.46) = p(2.81,5.46)

p(Not _ passed | 2.81,5.46) > p(Passed | 2.81,5.46)

Вам также может понравиться