Content deleted Content added
Sahraoui17 (talk | contribs) No edit summary |
→See also: per WP:SEEALSO, avoid repeating links in this section |
||
(32 intermediate revisions by 17 users not shown) | |||
Line 10:
: <math>P(w,d) = \sum_c P(c) P(d|c) P(w|c) = P(d) \sum_c P(c|d) P(w|c)</math>
with
So, the number of parameters is equal to <math>cd + wc</math>. The number of parameters grows linearly with the number of documents. In addition, although PLSA is a generative model of the documents in the collection it is estimated on, it is not a generative model of new documents.
Line 17:
== Application ==
PLSA may be used in a discriminative setting, via [[Fisher kernel]]s.<ref>Thomas Hofmann, [
PLSA has applications in [[information retrieval]] and [[information filtering|filtering]], [[natural language processing]], [[machine learning]] from text, [[bioinformatics]],<ref>{{Cite conference|chapter=Enhanced probabilistic latent semantic analysis with weighting schemes to predict genomic annotations|conference=The 13th IEEE International Conference on BioInformatics and
</ref> and related areas.
It is reported that the [[aspect model]] used in the probabilistic latent semantic analysis has severe [[overfitting]] problems.<ref>{{cite journal|title=Latent Dirichlet Allocation|journal=Journal of Machine Learning Research|year=2003|first=David M.|last=Blei|author2=Andrew Y. Ng |author3=Michael I. Jordan |volume=3|pages=993–1022
==Extensions ==
Line 29 ⟶ 28:
** Asymmetric: MASHA ("Multinomial ASymmetric Hierarchical Analysis")<ref>Alexei Vinokourov and Mark Girolami, [http://citeseer.ist.psu.edu/rd/30973750,455249,1,0.25,Download/http://citeseer.ist.psu.edu/cache/papers/cs/22961/http:zSzzSzcis.paisley.ac.ukzSzvino-ci0zSzvinokourov_masha.pdf/vinokourov02probabilistic.pdf A Probabilistic Framework for the Hierarchic Organisation and Classification of Document Collections], in ''Information Processing and Management'', 2002</ref>
** Symmetric: HPLSA ("Hierarchical Probabilistic Latent Semantic Analysis")<ref>Eric Gaussier, Cyril Goutte, Kris Popat and Francine Chen,
[http://www.xrce.xerox.com/Research-Development/Publications/2002-004 A Hierarchical Model for Clustering and Categorising Documents] {{Webarchive|url=https://web.archive.org/web/20160304033131/http://www.xrce.xerox.com/Research-Development/Publications/2002-004 |date=2016-03-04 }}, in "Advances in Information Retrieval -- Proceedings of the 24th [[Information Retrieval Specialist Group|BCS-IRSG]] European Colloquium on IR Research (ECIR-02)", 2002</ref>
* Generative models: The following models have been developed to address an often-criticized shortcoming of PLSA, namely that it is not a proper generative model for new documents.
Line 36 ⟶ 35:
==History==
This is an example of a [[latent class model]] (see references therein), and it is related<ref>Chris Ding, Tao Li, Wei Peng (2006). "[http://www.aaai.org/Papers/AAAI/2006/AAAI06-055.pdf Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence Chi-Square Statistic, and a Hybrid Method. AAAI 2006" ]</ref><ref>Chris Ding, Tao Li, Wei Peng (2008). "[http://www.sciencedirect.com/science/article/pii/S0167947308000145 On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing"]</ref> to [[non-negative matrix factorization]]. The present terminology was coined in 1999 by
== See also ==
Line 47 ⟶ 46:
==External links==
*[https://web.archive.org/web/20050120213347/http://www.cs.brown.edu/people/th/papers/Hofmann-UAI99.pdf Probabilistic Latent Semantic Analysis]
*[https://web.archive.org/web/20170717235351/http://www.semanticquery.com/archive/semanticsearchart/researchpLSA.html Complete PLSA DEMO in C#]
{{DEFAULTSORT:Probabilistic Latent Semantic Analysis}}
[[Category:Statistical natural language processing]]
[[Category:
[[Category:Latent variable models]]
[[Category:Language modeling]]
|