Notes On Probabilistic Latent Semantic Analysis PDF

Notes on Probabilis!c Latent Seman!
c
Analysis
Nota!ons
We first list related nota!ons as follows.
M is the number of documents. K is the number of topics. V is the number of

dis!nct words.
Nm is the length of the m-th document.
wm,n (1 ≤ m ≤ M ; 1 ≤ n ≤ Nm ) is the n-th word in the m-th document.
zm,n (1 ≤ m ≤ M ; 1 ≤ n ≤ Nm ) is the topic assigned to the n-th word in the
m-th document, which is a latent (or hidden) variable.
θm (1 ≤ m ≤ M ) is topic propor!on of the m-th document, which is a K -
dimension vector. θm,k is the k -th element in θm , which is corresponding the
K
propor!on of the topic k in the m-th document, and ∑k=1 θm,k = 1.
ϕk (1 ≤ k ≤ K) is word propor!on of the k -th topic, which is a V -dimension
vector. ϕk,v is the v -th element in ϕk , which is corresponding to the propor!on of
the word v in the k -th topic. Each v is an element of the dic!onary with V dis!nct
V
words, and ∑v=1 ϕk,v = 1.
We can represent θ1 ⋯ θm ⋯ θM as a M × K matrix Θ.
⎡ θ1 ⎤ ⎡ θ1,1 … θ1,k … θ1,K ⎤

⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
Θ=⎢ ⎥ ⎢
⎢ θm ⎥ = ⎢ θm,1 … … θm,K ⎥⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⎥
θm,k
⋮ ⋮ ⋱
⎣θ ⎦ ⎣θ … θM ,k … θM ,K ⎦
M M ,1
Notes:
In the pLSA model, Θ, θm and θm,k are parameters instead of random variables.
We can represent ϕ1 ⋯ ϕk ⋯ ϕK as a K × V matrix Φ.
⎡ ϕ1 ⎤ ⎡ ϕ1,1 … ϕ1,v … ϕ1,V ⎤

⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋱ ⋮ ⎥
Φ=⎢ ⎥ ⎢
⎢ ϕk ⎥ = ⎢ ϕk,1 … … ϕk,V ⎥
⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⎥
ϕk,v
⋮ ⋮ ⋱
⎣ϕK ⎦ ⎣ϕK,1 … ϕK,v … ϕK,V ⎦
Notes:
In the pLSA model, Φ, ϕk and ϕk,v are parameters instead of random variables.
Graphical model for pLSA
Based on the genera!ve process of pLSA, we can represent the pLSA model using
“collapsed”’ plate nota!on.
For an easy understanding, the corresponding `èxpanded’’ model is also shown.
EM for pLSA
If the Maximum Likelihood Es!ma!on (MLE) is used to es!mate the parameters for the
pLSA model, we need to fit the parameters which maximize probability observed data of
all words W in documents, or likelihood of W with respect to parameters Θ and Φ.
W = (w1,1 ⋯ w1,n ⋯ w1,N1

⋯ wm,1 ⋯ wm,n ⋯ wm,Nm
⋯ wM ,1 ⋯ wM ,n ⋯ wM ,NM )
We can get log-likelihood
M Nm
ℓ(Θ, Φ) = log p(W ; Θ, Φ) = log( ∏ ∏ p(wm,n ; Θ, Φ))
m=1 n=1
wm,n is the n-th word in the m-th document, and m, n represents the posi!on of the
word in the documents. Words in different posi!ons (corresponding to different values
of m and n) can be instances of the same v in the dic!onary. In the m-th document, we
V
can set the total number of words which is equal to v as cm,v , and ∑v=1 cm,v = Nm .
So we can store the value of cm,v into a M × V matrix C . The word v is observed with
the probability of p(v; Θ, Φ), and the corresponding topic of v is z .
M V
ℓ(Θ, Φ) = log( ∏ ∏ p(v; Θ, Φ)cm,v )
m=1 v=1
M V
= ∑ ∑(cm,v × log p(v; Θ, Φ))
m=1 v=1
M V
= ∑ ∑(cm,v × log ∑ p(v, z; Θ, Φ))
m=1 v=1 z
M V K
= ∑ ∑(cm,v × log ∑ p(v, z = k; Θ, Φ))
m=1 v=1 k=1
Based on the genera!ve process of pLSA, in the m-th document, we can get
p(v, z; Θ, Φ) = p(z; θm )p(v∣z; Φ) = p(z; θm )p(v∣z; ϕz )
Based on the nota!ons we can get
p(z = k; θm ) = θm,k
p(v∣z = k; ϕk ) = ϕk,v
p(v, z = k; Θ, Φ) = θm,k × ϕk,v
Based on Jensen’s inequality, we have
M V K
ℓ(Θ, Φ) = ∑ ∑(cm,v × log ∑ p(v, z = k; Θ, Φ))
m=1 v=1 k=1
M V K
= ∑ ∑(cm,v × log ∑ θm,k × ϕk,v )
m=1 v=1 k=1
M V K
θm,k × ϕk,v
= ∑ ∑(cm,v × log ∑(Qm,v (z = k) × ))
Qm,v (z = k)
m=1 v=1 k=1
M V K
θm,k × ϕk,v
≥ ∑ ∑(cm,v × ∑(Qm,v (z = k) × log ))
Qm,v (z = k)
m=1 v=1 k=1
= L(Θ, Φ)
E-step
In the E-step in EM method, we calculate the value of Qm,v (z = k), with which the
lower bound L(Θ, Φ) is equal to ℓ(Θ, Φ), as follows:
p(v, z = k; Θ, Φ)
Qm,v (z = k) = K
∑k′ =1 p(v, z = k ′ ; Θ, Φ)
θm,k × ϕk,v
= K
∑k′ =1 θm,k′ × ϕk′ ,v
Notes:
p(v, z = k; Θ, Φ) p(v, z = k; Θ, Φ)
= = p(z = k∣v; Θ, Φ)
K
∑k′ =1 p(v, z = k ′ ; Θ, Φ) p(v; Θ, Φ)
So, actually we have
Qm,v (z = k) = p(z = k∣v; Θ, Φ)
M-step
In the M-step of the EM method, we assign the value of Qm,v (z = k) calculated from
the E-step, and try to maximize the lower bound L(Θ, Φ) with respect to Θ and Φ.
Notes:
Each Qm,v (z = k) in the M-step is a fixed value instead of variable.
K V
Because ∑k=1 θm,k = 1 and ∑v=1 ϕk,v = 1, we can use the method of Lagrange
mul!plier.
M V K
θm,k × ϕk,v
L(Θ, Φ) = ∑ ∑(cm,v × ∑(Qm,v (z = k) × log ))
Qm,v (z = k)
m=1 v=1 k=1
K V
+ ∑ λk (1 − ∑ ϕk,v )
k=1 v=1
M K
+ ∑ ρm (1 − ∑ θm,k )
m=1 k=1
First, we get the par!al deriva!ve with respect to ϕk,v .
M
∂L
= ∑ (cm,v × Qm,v (z = k)) − λk × ϕk,v = 0
∂ϕk,v
m=1
And
M
1
ϕk,v = × ∑ (cm,v × Qm,v (z = k))
λk
m=1
V
Because ∑v=1 ϕk,v = 1, we can get
V V M
1
∑ ϕk,v = × ∑ ∑ (cm,v × Qm,v (z = k)) = 1
λk
v=1 v=1 m=1
So we can get
V M
λk = ∑ ∑ (cm,v × Qm,v (z = k))
v=1 m=1
M
∑m=1 (cm,v × Qm,v (z = k))
ϕk,v = V M
∑v=1 ∑m=1 (cm,v × Qm,v (z = k))
Similarly, we get the par!al deriva!ve with respect to θm,k .
V
∂L
= ∑(cm,v × Qm,v (z = k)) − ρm × θm,k = 0
∂θm,k
v=1
And
V
1
θm,k = × ∑(cm,v × Qm,v (z = k))
ρm
v=1
K
Because ∑k=1 θm,k = 1, we can get
K K V
1
∑ θm,k = × ∑ ∑(cm,v × Qm,v (z = k)) = 1
ρm
k=1 k=1 v=1
K V
Because ∑k=1 Qm,v (z = k) = 1 and ∑v=1 cm,v = Nm , we can get
K V
ρm = ∑ ∑(cm,v × Qm,v (z = k))
k=1 v=1
V K
= ∑ ∑(cm,v × Qm,v (z = k))
v=1 k=1
V K
= ∑(cm,v × ∑ Qm,v (z = k))
v=1 k=1
V
= ∑(cm,v × 1)
v=1
= Nm
We can further get
V
∑v=1 (cm,v × Qm,v (z = k))
θm,k =
Nm

Notes On Probabilistic Latent Semantic Analysis PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Notes On Probabilistic Latent Semantic Analysis PDF

Загружено:

Авторское право:

Доступные форматы

Notes on Probabilis!c Latent Seman!

We first list related nota!ons as follows.

M is the number of documents. K is the number of topics. V is the number of

We can represent θ1 ⋯ θm ⋯ θM as a M × K matrix Θ.

⎡ θ1 ⎤ ⎡ θ1,1 … θ1,k … θ1,K ⎤

⎡ ϕ1 ⎤ ⎡ ϕ1,1 … ϕ1,v … ϕ1,V ⎤

Graphical model for pLSA

W = (w1,1 ⋯ w1,n ⋯ w1,N1

We can get log-likelihood

p(v, z; Θ, Φ) = p(z; θm )p(v∣z; Φ) = p(z; θm )p(v∣z; ϕz )

Based on the nota!ons we can get

Based on Jensen’s inequality, we have

So, actually we have

Qm,v (z = k) = p(z = k∣v; Θ, Φ)

First, we get the par!al deriva!ve with respect to ϕk,v .

Similarly, we get the par!al deriva!ve with respect to θm,k .

We can further get

Вам также может понравиться