Introduc) On To Probabilis) C Latent Seman) C Analysis: NYP Predic) Ve Analy) Cs Meetup June 10, 2010

Introduc)on to Probabilis)c 
Latent Seman)c Analysis 
NYP Predic)ve Analy)cs Meetup 
June 10, 2010 
PLSA 
•  A type of latent variable model with observed 
count data and nominal latent variable(s). 
•  Despite the adjec)ve ‘seman)c’ in the acronym, 
the method is not inherently about meaning. 
–  Not any more than, say, its cousin Latent Class 
Analysis 
•  Rather, the name must be read as P + LS(A|I), 
marking the genealogy of PLSA as a probabilis)c 
re‐cast of Latent Seman)c Analysis/Indexing. 
LSA 
•  Factoriza)on of data matrix into orthogonal 
matrices to form bases of (seman)c) vector 
space: 
•  Reduc)on of original matrix to lower‐rank: 
•  LSA for text complexity: cosine similarity between 
paragraphs. 
Problems with LSA 
•  Non‐probabilis)c 
•  Fails to handle polysemy.   
–  Polysemy called “noise” in LSA literature. 
•  Shown (by Hofmann) to underperform 
compared to PLSA on IR task 
Probabili)es Why? 
•  Probabilis)c systems allow for the evalua)on of 
proposi)ons under condi)ons of uncertainty.  
Probabilis)c seman)cs. 
•  Probabilis)c systems provide a uniform mechanism for 
integra)ng and reasoning over heterogeneous 
informa)on. 
–  In PLSA seman)c dimensions are represented by unigram 
language models, more transparent than eigenvectors. 
–  The latent variable structure allows for subtopics 
(hierarchical PLSA) 
•  “If the weather is sunny tomorrow and I’m not )red we 
will go to the beach” 
–  p(beach) = p(sunny & ~)red) = p(sunny)(1‐p()red)) 
A Genera)ve Model? 
•  Let X be a random vector with components {X1, 
X2, … , Xn} random variables. 
•  Each realiza)on of X is assigned to a class, one of 
a random variable Y. 
•  A genera(ve model tells a story about how the 
Xs came about: “once upon a )me, a Y was 
selected, then Xs were created out of that Y”. 
•  A discrimina(ve model strives to iden)fy, as 
unambiguously as possible, the Y value for some 
given X 
•  A discrimina)ve model es)mates P(Y|X) 
directly. 
•  A genera)ve model es)mates P(X|Y) and P(Y) 
–  The predic)ve direc)on is then computed via 
Bayesian inversion:  
where P(X) is obtained by condi)oning on Y: 
    
•  A classic genera)ve/discrimina)ve pair: Naïve 
Bayes vs Logis)c Regression. 
•  Naïve Bayes assumes that the Xis are 
condi)onally independent given Y, so it es)mates 
P(Xi | Y). 
•  Logis)c regression makes other assump)ons, e.g. 
linearity of the independent variables with logit 
of dependent, independence of errors, but 
handles correlated predictors (up to perfect 
collinearity). 
•  Genera)ve models have richer probabilis)c 
seman)cs.   
–  Func)ons run both way. 
–  Assign distribu)ons to the “independent” variables, 
even previously unseen realiza)ons. 
•  Ng and Jordan (2002) show that logis)c 
regression has higher asympto)c accuracy, but 
converges more slowly, sugges)ng a trade‐off 
between accuracy and variance. 
•  Overall trade‐off between accuracy and 
usefulness. 
•  Start with document  •  Start with topic 
D  P(D)  P(D|Z)  D 
Z 
Z  P(Z|D) 
P(Z) 
W 
P(W|Z) 
W  P(W|Z) 
•  The observed data are cells of document‐term matrix 
–  We generate (doc, word) pairs. 
–  Random variables D, W and Z as sources of objects 
•  Either: 
–  Draw a document, draw a topic from the document, draw 
a word from the topic. 
–  Draw a topic, draw a document from the topic, draw a 
word from the topic. 
•  The two models are sta)s)cally equivalent 
–  Will generate iden)cal likelihoods when fit 
–  Proof by Bayesian inversion 
•  In any case D and W are condi)onally independent 
given Z. 
•  But what is a Document here? 
–  Just a label!  There are no anributes associated with 
documents.   
–  P(D|Z) relates topics to labels 
•  A previously unseen document is just a new label 
•  Therefore PLSA isn’t genera)ve in an interes)ng 
way, as it cannot handle previously unseen inputs 
in a genera)ve manner. 
–  Though the P(Z) distribu)on may s)ll be of interest. 
Es)ma)ng the Parameters 
•  Θ = {P(Z); P(D|Z); P(W|Z)} 
•  All distribu)ons refer to latent variable Z, so 
cannot be es)mated directly from the data. 
•  How do we know when we have the right 
parameters? 
–  When we have the θ that most closely generates 
the data, i.e. the document‐term matrix 
 Es)ma)ng the Parameters 
•  The joint P(D,W) generates the observed 
document‐term matrix. 
•  The parameter vector θ yields the joint P(D,W) 
•  We want θ that maximizes the probability of 
the observed data. 
•  For the mul)nomial distribu)on, 
•  Let X be the MxN document‐term matrix.  
•  Imagine we knew the X’ = MxNxK complete 
data matrix, where the counts for topics were 
overt.  Then, 
New and interes)ng:  The usual parameters θ 
unseen counts must sum 
to 1 for given d,w 
•  We can factorize the counts in terms of the 
observed counts and a hidden distribu)on: 
•  Let’s give the hidden distribu)on its name: 
P(Z|D,W), the posterior distribu)on of Z w.r.t. 
D,W 
•  P(Z|D,W) can be obtained from the 
parameters via Bayes and our core model 
assump)on of condi)onal independence: 
•  Nobody said the genera)on of P(Z|D,W) must 
be based on the same parameter vector as the 
one we’re looking for! 
•  Say we obtain P(Z|D,W) based on randomly 
generated parameters θn : 
•  We get a func)on of the parameters: 
•  The resul)ng func)on, Q(θ), is the condi)onal 
expecta)on of the complete data likelihood with 
respect to the distribu)on P(Z|D,W).  
•  It turns out that if we find the parameters that 
maximize Q we get a bener es)mate of the 
parameters! 
•  Expressions for the parameters can be had by 
sesng the par)al deriva)ves with respect to the 
parameters to zero and solving, using Laplace 
transforms. 
•  E‐step (misnamed) 
•  M‐step 
•  Concretely, we generate (randomly)  
 θ1 = {Pθ1(Z); Pθ1(D|Z); Pθ1(W|Z)} .  
•  Compute the posterior Pθ1(Z|W,D). 
•  Compute new parameters θ2 .  
•  Repeat un)l “convergence”, say un)l the log 
likelihood stops changing a lot, or un)l 
boredom, or some N itera)ons. 
•  For stability, average over mul)ple starts, 
varying numbers of topics. 
Folding In 
•  When a new document comes along, we want to 
es)mate the posterior of the topics for the 
document. 
–  What is it about?  I.e. what is the distribu)on over 
topics of the new document? 
•  Perform a “linle EM”:  
–  E‐step: compute P(Z|W, Dnew) 
–  M‐step: compute P(Z|Dnew) keeping all other 
parameters unchanged. 
–  Converges very fast, five itera)ons? 
–  Overtly discrimina)ve!  The true colors of the method 
emerge. 
Problems with PLSA 
•  Easily huge number of parameters 
–  Leads to unstable es)ma)on (local maxima). 
–  Computa)onally intractable because of huge 
matrices 
–  Modeling the documents directly can be problem 
•  What if the collec)on has millions of documents? 
•  Not properly genera)ve (is this a problem?) 
Examples of Applica)ons 
•  Informa)on Retrieval: compare topic 
distribu)ons for documents and queries using 
a similarity measure like rela)ve entropy. 
•  Collabora)ve Filtering (Hoffman, 2002) using 
Gaussian PLSA. 
•  Topic segmenta)on in texts, by looking for 
spikes in the distances between topic 
distribu)ons for neighbouring text blocks. 

Introduc) On To Probabilis) C Latent Seman) C Analysis: NYP Predic) Ve Analy) Cs Meetup June 10, 2010

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduc) On To Probabilis) C Latent Seman) C Analysis: NYP Predic) Ve Analy) Cs Meetup June 10, 2010

Загружено:

Авторское право:

Доступные форматы

Introduc)on to Probabilis)c

Вам также может понравиться

Introduc) On To Probabilis) C Latent Seman) C Analysis: NYP Predic) Ve Analy) Cs Meetup June 10, 2010

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Introduc) On To Probabilis) C Latent Seman) C Analysis: NYP Predic) Ve Analy) Cs Meetup June 10, 2010

Загружено:

Авторское право:

Доступные форматы

Introduc)on to Probabilis)c

Вам также может понравиться

Introduc)on to Probabilis)c