Академический Документы
Профессиональный Документы
Культура Документы
Xiaojin Zhu
Department of Computer Sciences
University of WisconsinMadison, USA
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
v p(x1 , . . . , xn , xn = v)
p(xn | x1 , . . . , xn ) = P
Conclusion
Cant do it either
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Graphical-Model-Nots
neural network
decision tree
network flow
HMM template
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Bayesian Network
I
Example: Alarm
Binary variables
P(B)=0.001
P(E)=0.002
B
P(A | B, E) = 0.95
P(A | B, ~E) = 0.94
P(A | ~B, E) = 0.29
P(A | ~B, ~E) = 0.001
A
J
P(J | A) = 0.9
P(J | ~A) = 0.05
M
P(M | A) = 0.7
P(M | ~A) = 0.01
P (B, E, A, J, M )
= P (B)P ( E)P (A | B, E)P (J | A)P ( M | A)
= 0.001 (1 0.002) 0.94 0.9 (1 0.7)
.000253
x1
...
xd
x
d
Qd
i=1 p(xi
| y)
p(y, x1 , . . . xd ) = p(y)
No Causality Whatsoever
P(A)=a
P(B|A)=b
P(B|~A)=c
P(B)=ab+(1a)c
P(A|B)=ab/(ab+(1a)c)
P(A|~B)=a(1b)/(1ab(1a)c)
Nd
w
Nd
w
Nd
w
Nd
w
Nd
w
Nd
w
Nd
w
Nd
w
p(word | topic)
troops
election
love
Conditional Independence
C
A
A, B in general dependent
A, B in general dependent
A, B in general independent
d-Separation
d-Separation Example 1
A, B dependent given C
d-Separation Example 2
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Z=
RQ
x1
x2
C
x1 , x2 {1, 1}
p(x1 , x2 ) =
Z = (ea +
1 ax1 x2
Ze
ea + ea
+ ea )
Real-valued weights w1 , . . . , wk
!
k
X
1
p(X) = exp
wi fi (X)
Z
i=1
s xs
st
xt
X
X
1
s xs +
st xs xt
p (x) = exp
Z
sV
ws = s , wst = st
(s,t)E
noisy image
argmaxX P (X|Y )
p(X) N (, ) =
1
(2)n/2 ||1/2
1
exp (X )> 1 (X )
2
Multivariate Gaussian
Factor Graph
I
(A,B,C)
f (A,B,C)
B
f
P(A)P(B)P(C|A,B)
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Inference by Enumeration
Infer P (XQ | XE )
By definition
P
P (XQ , XE )
X P (XQ , XE , XO )
=P O
P (XQ | XE ) =
P (XE )
XQ ,XO P (XQ , XE , XO )
I
I
I
I
For a graphical
model, the joint probability factors
Q
p(X) = m
f
j=1 j (X(j) )
Eliminating a Variable
+
+ by whether
f1 . . . fl fl+1
. . . fm
Rearrange factors
x1 X(j)
P
+
+
x2 ...xk f1 . . . fl
x1 fl+1 . . . fm
P
+
+
Introduce a new factor fm+1
=
x1 fl+1 . . . fm
+
+ except x
fm+1
contains the union of variables in fl+1
. . . fm
1
P
In fact, x1 disappears altogether in x2 ...xk f1 . . . fl fm+1
I
I
I
I
x1 ...xk
Obviously equivalent:
Binary variables
A,B,C
A
I
Enumeration: 48 , 14 +
Handling Evidence
Eliminate variables XO = X XE XQ
P (XQ | XE = e) = P
Enumeration
Variable elimination
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
(1)
(m)
p(xQ | XE ), an
m
1 X
p(XQ = cQ | XE )
1(x(i) =c )
Q
m
Q
i=1
P(E)=0.002
B
P(A | B, E) = 0.95
P(A | B, ~E) = 0.94
P(A | ~B, E) = 0.29
P(A | ~B, ~E) = 0.001
A
J
P(J | A) = 0.9
P(J | ~A) = 0.05
M
P(M | A) = 0.7
P(M | ~A) = 0.01
p(B = 1 | E = 1, M = 1)
1 X
1(B (i) =1)
m
i=1
I
I
P(E)=0.002
B
P(A | B, E) = 0.95
P(A | B, ~E) = 0.94
P(A | ~B, E) = 0.29
P(A | ~B, ~E) = 0.001
A
J
P(J | A) = 0.9
P(J | ~A) = 0.05
M
P(M | A) = 0.7
P(M | ~A) = 0.01
Gibbs Update
I
I
I
I
I
P(E)=0.002
B
P(A | B, E) = 0.95
P(A | B, ~E) = 0.94
P(A | ~B, E) = 0.29
P(A | ~B, ~E) = 0.001
A
J
Gibbs Update
I
P(E)=0.002
B
P(A | B, E) = 0.95
P(A | B, ~E) = 0.94
P(A | ~B, E) = 0.29
P(A | ~B, ~E) = 0.001
A
J
P(J | A) = 0.9
P(J | ~A) = 0.05
M
P(M | A) = 0.7
P(M | ~A) = 0.01
A
xs
X
X
1
s xs +
p (x) = exp
st xs xt
Z
sV
(s,t)E
C
I
1
exp((s +
tN (s) st xt ))
+1
I
I
I
I
Pm
(i) )
if X (i) p
In general, Ep [f (X)]
If so,
i=1 f (X
m
i=1
if Y (i) p(Y )
I
Nd
w
P (zi = j | zi , w)
(w )
(d )
i
i
ni,j
+ ni,j
+
()
(d )
i
ni,j + W ni,
+ T
i
ni,j
: number of times word wi has been assigned to topic j,
excluding the current position
(di )
ni,j
: number of times a word from document di has been
assigned to topic j, excluding the current position
()
ni,j : number of times any word has been assigned to topic j,
excluding the current position
(di )
ni,
: length of document di , excluding the current position
Gibbs sampling
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
1/2
1 = 2 = 1/2
I
z2
x1=R
x2=G
Factor graph
f1
P(z1)P(x1| z1)
z1
f2
P(z2| z1)P(x2| z2)
z2
Messages
Leaf Messages
I
z1
f2
P(z1)P(x1| z1)
z2
1/2
1 = 2 = 1/2
z1
f2
P(z1)P(x1| z1)
z2
z1 f2 (z1 = 1) = 1/4
z1 f2 (z1 = 2) = 1/8
1/4
1/2
1 = 2 = 1/2
...
x1
z1
P(z1)P(x1| z1)
2
X
fs (x, x1 , . . . , xM )
xM
f1
f2 z2 (s) =
M
Y
xm fs (xm )
m=1
f2
z2
s0 =1
z1
f2
P(z1)P(x1| z1)
z2
z2 f2 (z2 = 1) = 1
z2 f2 (z2 = 2) = 1
1/4
1/2
1 = 2 = 1/2
f1
z1
f2
z2
P(z1)P(x1| z1)
P2
f2 z1 (s) =
s0 )
= 1P (z2 = 1|z1 = s)P (x2 = G|z2 = 1)
+ 1P (z2 = 2|z1 = s)P (x2 = G|z2 = 2). We get
f2 z1 (z1 = 1) = 7/16
f2 z1 (z1 = 2) = 3/8
1/4
1/2
1 = 2 = 1/2
In this example
1/4
7/16
7/64
P (z1 |x1 , x2 ) f1 z1 f2 z1 = 1/8 3/8 = 3/64
1/32
P (z2 |x1 , x2 ) f2 z2 = 1/8 0.2
0.8
One can also compute the marginal of the set of variables xs
involved in a factor fs
Y
p(xs ) fs (xs )
xf (x)
xne(f )
0.7
0.3
Handling Evidence
Observing x = v,
I
Observing XE ,
I
p(x|XE ) = P
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
s xs
st
xt
X
X
1
p (x) = exp
s xs +
st xs xt
Z
sV
(s,t)E
The Conditional
s xs
st
xt
This reduces to
p(xs = 1 | xN (s) ) =
1
exp((s +
tN (s) st xt ))
+1
p(xs = 1 | xN (s) ) =
I
1
exp((s +
tN (s) st xt ))
+1
1
exp((s +
tN (s) st t ))
+1
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Exponential Family
Exponential Family
p (x) = exp > (x) A()
I
A = log Z
Can be rewritten as
p(x) = exp (x log + (1 x) log(1 ))
(x) = x, = log 1
, A() = log(1 + exp()).
s xs
st
xt
p (x) = exp
X
sV
s xs +
st xs xt A()
(s,t)E
I
I
xt
X
X
p (x) = exp
sj fsj (x) +
stjk fstjk (x) A()
sj
st
r2 |E|
stjk
d = r|V | +
P
Regular but overcomplete, because r1
j=0 sj (x) = 1 for any
s V and all x.
The Potts model is a special case where the parameters are
tied: stkk = , and stjk = for j 6= k.
Important Relation
Mean Parameters
I
Therefore M is convex
Nice property:
A()
i
= E [i (x)]
Conjugate Duality
The conjugate dual function A to A is defined as
A () = sup > A()
By definition
A () = sup log(1 + exp())
R
I
I
I
I
(x)
M(F)
sup > A ()
M(F )
The mean parameters for the Ising model are the node and
edge marginals: s = p(xx = 1), st = p(xs = 1, xt = 1)
sV
sV
max
(1 ...m )[0,1]m
sV
s s +
X
(s,t)E
st s t +
X
sV
H(s )
L() =
I
I
I
max
(1 ...m )[0,1]m
s s +
sV
X
(s,t)E
st s t +
H(s )
sV
1
P
1 + exp (s + (s,t)E st t )
I
I
sj = 1
s V
j=0
r1
X
k=0
r1
X
stjk = sj
s, t V, j = 0 . . . r 1
stjk = tk
s, t V, k = 0 . . . r 1
j=0
I
(x)
Approximating A
r1
XX
sV j=0
(s,t)E
X X
sj log sj
(s,t)E j,k
stjk log
stjk
sj tk
Approximating A
r1
XX
sV j=0
sj log sj
X X
(s,t)E j,k
stjk log
stjk
sj tk
A()
= sup > + HBethe (p )
L
I
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Maximizing Problems
Recall the HMM example
1/4
1/2
1 = 2 = 1/2
However z1:N
as a whole may not be the best
2. An alternative is to find
z1:N
= arg max p(z1:N |x1:N )
z1:N
I
I
I
xM
xm fs (xm ) =
f ne(xm )\fs
leaf f
leaf x
(x) = 1
(x) = f (x)
M
Y
m=1
f xm (xm )
with
xm fs (xm )
Y
pmax = max
f x (x)
x
f ne(x)
xM
M
Y
xm fs (xm )
m=1
f1
z1
f2
P(z1)P(x1| z1)
z2
1/2
1 = 2 = 1/2
f1
z1
f2
P(z1)P(x1| z1)
z2
1/2
1 = 2 = 1/2
f2 z2 (z2 = 1)
= max f2 (z1 , z2 )z1 f2 (z1 )
z1
z1
f2
P(z1)P(x1| z1)
z2
1/2
1 = 2 = 1/2
z1
f2
P(z1)P(x1| z1)
z2
1/2
1 = 2 = 1/2
f2 z2 =
1/64 z1 =1,2
3/32 z1 =1
At root z2 ,
max f2 z2 (s) = 3/32
s=1,2
z2 = 2 z1 = 1
z1:2
= arg max p(z1:2 |x1:2 ) = (1, 2)
z1:2
x1 ...xM
m=1
xm fs (xm ) =
M
X
f xm (xm )
f ne(xm )\fs
leaf f
leaf x
(x) = 0
(x) = log f (x)
X
f ne(x)
f x (x)
xm fs (xm )
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
Parameter Learning
I
I
i=1
Pn
The solution is M L = (
), the exponential family density
whose mean parameter matches
.
When
M0 and minimal, there is a unique maximum
likelihood solution M L .
I
I
`() =
1X
Axi () A()
n
i=1
n
1 X > i
Axi (i ) A()
n
i=1
L(1 , . . . , n , )
1
n
Pn
i
i=1
The solution (
) satisfies E() [(x)] =
Variational EM
For loopy graphs E-step often intractable.
I
Cant maximize
max > i Axi (i )
i Mxi
I
i Mxi (F )
I
I
> i Axi (i )
up to local maximum
recall Mxi (F ) is an inner approximation to Mxi
Outline
Life without Graphical Models
Representation
Directed Graphical Models (Bayesian Networks)
Undirected Graphical Models (Markov Random Fields)
Inference
Exact Inference
Markov Chain Monte Carlo
Variational Inference
Loopy Belief Propagation
Mean Field Algorithm
Exponential Family
Maximizing Problems
Parameter Learning
Structure Learning
!
X
i fi (X)
iM
x2
x3
x4
S=
(X X)
n
i
log || +
X
1 X (i) >
X
X (i) +
|ij |
n
i=1
Known as glasso
i6=j
Recap
BN or MRF
conditional independence