Вы находитесь на странице: 1из 8

Notes on Probabilis!c Latent Seman!

c
Analysis

Nota!ons

We first list related nota!ons as follows.

M is the number of documents. K is the number of topics. V is the number of


dis!nct words.
Nm is the length of the m-th document.
wm,n (1 ≤ m ≤ M ; 1 ≤ n ≤ Nm ) is the n-th word in the m-th document.
zm,n (1 ≤ m ≤ M ; 1 ≤ n ≤ Nm ) is the topic assigned to the n-th word in the
m-th document, which is a latent (or hidden) variable.
θm (1 ≤ m ≤ M ) is topic propor!on of the m-th document, which is a K -
dimension vector. θm,k is the k -th element in θm , which is corresponding the
K
propor!on of the topic k in the m-th document, and ∑k=1 θm,k = 1.
ϕk (1 ≤ k ≤ K) is word propor!on of the k -th topic, which is a V -dimension
vector. ϕk,v is the v -th element in ϕk , which is corresponding to the propor!on of
the word v in the k -th topic. Each v is an element of the dic!onary with V dis!nct
V
words, and ∑v=1 ϕk,v = 1.

We can represent θ1 ⋯ θm ⋯ θM as a M × K matrix Θ.

⎡ θ1 ⎤ ⎡ θ1,1 … θ1,k … θ1,K ⎤


⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
Θ=⎢ ⎥ ⎢
⎢ θm ⎥ = ⎢ θm,1 … … θm,K ⎥⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⎥
θm,k
⋮ ⋮ ⋱
⎣θ ⎦ ⎣θ … θM ,k … θM ,K ⎦
M M ,1

Notes:
In the pLSA model, Θ, θm and θm,k are parameters instead of random variables.
We can represent ϕ1 ⋯ ϕk ⋯ ϕK as a K × V matrix Φ.

⎡ ϕ1 ⎤ ⎡ ϕ1,1 … ϕ1,v … ϕ1,V ⎤


⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
Φ=⎢ ⎥ ⎢
⎢ ϕk ⎥ = ⎢ ϕk,1 … … ϕk,V ⎥

⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⎥
ϕk,v
⋮ ⋮ ⋱
⎣ϕK ⎦ ⎣ϕK,1 … ϕK,v … ϕK,V ⎦

Notes:
In the pLSA model, Φ, ϕk and ϕk,v are parameters instead of random variables.

Graphical model for pLSA

Based on the genera!ve process of pLSA, we can represent the pLSA model using
“collapsed”’ plate nota!on.
For an easy understanding, the corresponding ``expanded’’ model is also shown.
EM for pLSA

If the Maximum Likelihood Es!ma!on (MLE) is used to es!mate the parameters for the
pLSA model, we need to fit the parameters which maximize probability observed data of
all words W in documents, or likelihood of W with respect to parameters Θ and Φ.

W = (w1,1 ⋯ w1,n ⋯ w1,N1


⋯ wm,1 ⋯ wm,n ⋯ wm,Nm
⋯ wM ,1 ⋯ wM ,n ⋯ wM ,NM )

We can get log-likelihood

M Nm
ℓ(Θ, Φ) = log p(W ; Θ, Φ) = log( ∏ ∏ p(wm,n ; Θ, Φ))
m=1 n=1

wm,n is the n-th word in the m-th document, and m, n represents the posi!on of the
word in the documents. Words in different posi!ons (corresponding to different values
of m and n) can be instances of the same v in the dic!onary. In the m-th document, we
V
can set the total number of words which is equal to v as cm,v , and ∑v=1 cm,v = Nm .
So we can store the value of cm,v into a M × V matrix C . The word v is observed with
the probability of p(v; Θ, Φ), and the corresponding topic of v is z .
M V
ℓ(Θ, Φ) = log( ∏ ∏ p(v; Θ, Φ)cm,v )
m=1 v=1
M V
= ∑ ∑(cm,v × log p(v; Θ, Φ))
m=1 v=1
M V
= ∑ ∑(cm,v × log ∑ p(v, z; Θ, Φ))
m=1 v=1 z
M V K
= ∑ ∑(cm,v × log ∑ p(v, z = k; Θ, Φ))
m=1 v=1 k=1

Based on the genera!ve process of pLSA, in the m-th document, we can get

p(v, z; Θ, Φ) = p(z; θm )p(v∣z; Φ) = p(z; θm )p(v∣z; ϕz )

Based on the nota!ons we can get

p(z = k; θm ) = θm,k
p(v∣z = k; ϕk ) = ϕk,v
p(v, z = k; Θ, Φ) = θm,k × ϕk,v

Based on Jensen’s inequality, we have

M V K
ℓ(Θ, Φ) = ∑ ∑(cm,v × log ∑ p(v, z = k; Θ, Φ))
m=1 v=1 k=1
M V K
= ∑ ∑(cm,v × log ∑ θm,k × ϕk,v )
m=1 v=1 k=1
M V K
θm,k × ϕk,v
= ∑ ∑(cm,v × log ∑(Qm,v (z = k) × ))
Qm,v (z = k)
m=1 v=1 k=1
M V K
θm,k × ϕk,v
≥ ∑ ∑(cm,v × ∑(Qm,v (z = k) × log ))
Qm,v (z = k)
m=1 v=1 k=1
= L(Θ, Φ)
E-step

In the E-step in EM method, we calculate the value of Qm,v (z = k), with which the
lower bound L(Θ, Φ) is equal to ℓ(Θ, Φ), as follows:

p(v, z = k; Θ, Φ)
Qm,v (z = k) = K
∑k′ =1 p(v, z = k ′ ; Θ, Φ)
θm,k × ϕk,v
= K
∑k′ =1 θm,k′ × ϕk′ ,v

Notes:

p(v, z = k; Θ, Φ) p(v, z = k; Θ, Φ)
= = p(z = k∣v; Θ, Φ)
K
∑k′ =1 p(v, z = k ′ ; Θ, Φ) p(v; Θ, Φ)

So, actually we have

Qm,v (z = k) = p(z = k∣v; Θ, Φ)

M-step

In the M-step of the EM method, we assign the value of Qm,v (z = k) calculated from
the E-step, and try to maximize the lower bound L(Θ, Φ) with respect to Θ and Φ.

Notes:
Each Qm,v (z = k) in the M-step is a fixed value instead of variable.

K V
Because ∑k=1 θm,k = 1 and ∑v=1 ϕk,v = 1, we can use the method of Lagrange
mul!plier.
M V K
θm,k × ϕk,v
L(Θ, Φ) = ∑ ∑(cm,v × ∑(Qm,v (z = k) × log ))
Qm,v (z = k)
m=1 v=1 k=1
K V
+ ∑ λk (1 − ∑ ϕk,v )
k=1 v=1
M K
+ ∑ ρm (1 − ∑ θm,k )
m=1 k=1

First, we get the par!al deriva!ve with respect to ϕk,v .

M
∂L
= ∑ (cm,v × Qm,v (z = k)) − λk × ϕk,v = 0
∂ϕk,v
m=1

And

M
1
ϕk,v = × ∑ (cm,v × Qm,v (z = k))
λk
m=1

V
Because ∑v=1 ϕk,v = 1, we can get

V V M
1
∑ ϕk,v = × ∑ ∑ (cm,v × Qm,v (z = k)) = 1
λk
v=1 v=1 m=1

So we can get

V M
λk = ∑ ∑ (cm,v × Qm,v (z = k))
v=1 m=1
M
∑m=1 (cm,v × Qm,v (z = k))
ϕk,v = V M
∑v=1 ∑m=1 (cm,v × Qm,v (z = k))

Similarly, we get the par!al deriva!ve with respect to θm,k .

V
∂L
= ∑(cm,v × Qm,v (z = k)) − ρm × θm,k = 0
∂θm,k
v=1

And
V
1
θm,k = × ∑(cm,v × Qm,v (z = k))
ρm
v=1

K
Because ∑k=1 θm,k = 1, we can get

K K V
1
∑ θm,k = × ∑ ∑(cm,v × Qm,v (z = k)) = 1
ρm
k=1 k=1 v=1

K V
Because ∑k=1 Qm,v (z = k) = 1 and ∑v=1 cm,v = Nm , we can get

K V
ρm = ∑ ∑(cm,v × Qm,v (z = k))
k=1 v=1
V K
= ∑ ∑(cm,v × Qm,v (z = k))
v=1 k=1
V K
= ∑(cm,v × ∑ Qm,v (z = k))
v=1 k=1
V
= ∑(cm,v × 1)
v=1
= Nm

We can further get

V
∑v=1 (cm,v × Qm,v (z = k))
θm,k =
Nm

Вам также может понравиться