Content deleted Content added
No edit summary |
No edit summary |
||
Line 17:
: Dear 137.250.39.133: first the goal here is not necessarily to reproduce what you read in other papers, but to provide a self-contained explanation of PLSA. Whether the latent variable is denoted c or z is inconsequential as long as it is clear that it is a latent variable. However, the main issue with the graph is that it is confusing w.r.t. the document variable 'd', which is denoted by the theta in the graph. I doubt every paper you read uses this notation -- of the papers cited here, Hofmann, Vinokourov et al. and Gaussier et al. cerrtainly do not. Finally there is a captioning problem: the words are not the only observables, the document index is observed too (by definition). [[User:Sunny house|Sunny house]] ([[User talk:Sunny house|talk]]) 13:18, 5 July 2008 (UTC)
: Actually Hofmann, in its original paper "Probabilistic Latent Semantic Analysis" uses "d" for the document variable, "z" for the topic and "w" for the observed word. However, this is by no means important, and several other works approach both the plate notation as the formulas using diverse letters for the variables. It is in fact more common to see "z" as the topic, but this should not be taken as a rule. For clarity, both the text and the image should have the same letters. Also, Sunny House, when you say that there is a captioning problem because of the documents being observed, I'm not sure if you mean that the document node should be shaded. If you do, it should not be shaded. It is not the observed document itself, but rather a distribution over the topics. The observed documents are underlying in the observed words node.
|