Revision as of 03:54, 17 March 2016 edit BG19bot (talk \| contribs) 1,005,055 edits m →Extensions: Remove blank line(s) between list items per WP:LISTGAP to fix an accessibility issue for users of screen readers. Do WP:GENFIXES and cleanup if needed. Discuss this at Wikipedia talk:WikiProject Accessibility#LISTGAP ← Previous edit		Revision as of 18:03, 17 March 2016 edit undo 99.197.154.53 (talk) →Model: inserted prepositional phrase denoting 'c' as the index and topic Next edit →
Line 10: : <math>P(w,d) = \sum_c P(c) P(d\|c) P(w\|c) = P(d) \sum_c P(c\|d) P(w\|c)</math> ~~being~~with 'c' being the words' topic. The first formulation is the ''symmetric'' formulation, where <math>w</math> and <math>d</math> are both generated from the latent class <math>c</math> in similar ways (using the conditional probabilities <math>P(d\|c)</math> and <math>P(w\|c)</math>), whereas the second formulation is the ''asymmetric'' formulation, where, for each document <math>d</math>, a latent class is chosen conditionally to the document according to <math>P(c\|d)</math>, and a word is then generated from that class according to <math>P(w\|c)</math>. Although we have used words and documents in this example, the co-occurrence of any couple of discrete variables may be modelled in exactly the same way. So, the number of parameters is equal to <math>cd + wc</math>. The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents.

Probabilistic latent semantic analysis: Difference between revisions