Revision as of 23:53, 9 March 2007 edit 131.243.240.136 (talk) No edit summary ← Previous edit		Revision as of 15:41, 15 March 2007 edit undo 132.246.126.133 (talk) added Lee&Seung + PTF as ref. Reworked "Types" and "Relation" sections Next edit →
Line 1: ''NMF redirects here. For the [[contract bridge\|bridge]] convention, see [[new minor forcing]].'' '''Non-negative matrix factorization''' (NMF) is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)\|matrix]], <math>\mathbf{X}</math>, is factorized into (usually) two matrices, <math>\mathbf{W}</math> and ~~<math>\mathbf{H}</math>~~ <math>\mathbf{H}</math> : <math>\operatorname{nmf}(\mathbf{X}) \rightarrow \mathbf{WH} </math> Factorization of matrices is generally non-unique, and a number of different methods of doing so have been developed (e.g. [[principal component analysis]] and [[singular value decomposition]]) by incorporating different constraints; non-negative matrix factorization differs from these methods in that it enforces the constraint that all three matrices must be [[non-negative matrix\|non-negative]], i.e., all elements must be equal to or greater than zero. Usually the number of columns of '''W''' and the number of rows of '''H''' in NMF are selected so the product '''WH''' will become an approximation to '''X''' (it has been suggested that the NMF model should be called ''nonnegative matrix approximation'' instead). The ~~The~~ full decomposition of '''X''' then amounts to the two non-negative matrices '''W''' and '''H''' as well as a residual '''U''': : : <math>\mathbf{X} = \mathbf{WH + U} </math> The elements of the ~~The elements of the~~ residual matrix can either be negative and positive - at least in the typical application of NMF. == History == Early work research on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal Early work research on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal \| author = P. Paatero, U. Tapper \| title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values \| journal = [[Environmetrics]] \| volume = 5 \| pages = 111-126 \| year = [[1994]] \| doi = 10.1002/env.3170050203}}</ref><ref>{{Cite journal \| author = [[Pia Anttila]], [[Pentti Paatero]], Unto Tapper, Olli Järvinen \| title = Source identification of bulk wet deposition in Finland by positive matrix factorization \| journal = [[Atmospheric Environment]] \| volume = 29 \| issue = 14 \| pages = 1705–1718 \| year = 1995 \| doi = 10.1016/1352-2310(94)00367-T }}</ref> It became more widely known as ''non-negative matrix factorization'' after Lee and Seung investigated ~~\| author = P. Paatero, U. Tapper~~ the properties of the algorithm and published some simple and useful ~~\| title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values~~ algorithms for two types of factorizations.<ref>Daniel D. Lee and H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization", ''[[Nature_journal \| Nature]]'' 401(6755), pp. 788-791.</ref> ~~\| journal = [[Environmetrics]]~~ <ref>Daniel D. Lee and H. Sebastian Seung (2001). "[http://www.nips.cc/Web/Groups/NIPS/NIPS2000/00papers-pub-on-web/LeeSeung.ps.gz Algorithms for Non-negative Matrix Factorization]", ''[[NIPS\|Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference]]'', pp. 556-562, [[MIT Press]].</ref> ~~\| volume = 5~~ ~~\| pages = 111-126~~ ~~\| year = [[1994]]~~ ~~\| doi = 10.1002/env.3170050203~~ ~~}}</ref><ref>{{Cite journal~~ ~~\| author = [[Pia Anttila]], [[Pentti Paatero]], Unto Tapper, Olli Järvinen~~ ~~\| title = Source identification of bulk wet deposition in Finland by positive matrix factorization~~ ~~\| journal = [[Atmospheric Environment]]~~ ~~\| volume = 29~~ ~~\| issue = 14~~ ~~\| pages = 1705–1718~~ ~~\| year = 1995~~ ~~\| doi = 10.1016/1352-2310(94)00367-T~~ ~~}}</ref>~~ ~~It became more widely known after Lee and Seung's investigations of the properties of the algorithm, and after they published a simple useful algorithm.~~ == Types == ~~There are different types of non-negative matrix factorizations and one of these is related to [[probabilistic latent semantic analysis]] and the [[latent class model]].~~ The different types arise from using different [[cost function]]s (divergence functions) and/or by [[regularization (mathematics)\|regularization]] of the '''W''' and/or '''H''' matrices.<ref>[[Inderjit S. Dhillon]], [[Suvrit Sra]], "[http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf Generalized Nonnegative Matrix Approximations with Bregman Divergences]", [[NIPS]], 2005.</ref>▼ ▲There are different types of non-negative matrix factorizations. The different types arise from using different [[cost function]]s (divergence functions) and/or by [[regularization (mathematics)\|regularization]] of the '''W''' and/or '''H''' matrices.<ref>[[Inderjit S. Dhillon]], [[Suvrit Sra]], "[http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf Generalized Nonnegative Matrix Approximations with Bregman Divergences]", [[NIPS]], 2005.</ref> == Relation to Data Clustering ==▼ ~~In the initial paper by Lee & Seung, NMF is proposed mainly for parts-of-whole decomposition~~ ▲== Relation to ~~Data~~other ~~Clustering~~Techniques == ~~of images and considered to be different from vector quantization ([[K-means clustering]]). It was later shown<ref>~~ Chris Ding, Xiaofeng He, and Horst D. Simon. "[http://crd.lbl.gov/~cding/papers/nmfSIAM1.pdf On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering]". Proc. SIAM Int'l Conf. Data Mining, pp:606-610, April 2005.</ref>▼ The initial paper by Lee & Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to [[vector quantization]] and [[principal component analysis]], and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results. that NMF is equivalent to the relaxed [[K-means clustering]] using the Frobenius norm objective function, matrix factor '''W''' contains cluster centroids and '''H''' contains cluster membership indicators; therefore NMF provides a framework for data clustering. It was later shown that NMF is an instance of a more general probabilistic model called "multinomial PCA".<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002.</ref> When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>Eric Gaussier and Cyril Goutte (2005). "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", ''Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05)'', pp. 601-602.</ref><ref>Chris Ding, Tao Li, Wei Peng (2006). "[http://crd.lbl.gov/~cding/papers/nmfpLSI.pdf Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method]", ''Proc. of AAAI National Conf. on Artificial Intelligence (AAAI-06)''.</ref> trained by [[maximum likelihood]] estimation. That method is commonly used for analyzing and clustering textual data and is also related to the [[latent class model]]. ▲It was also shown<ref>Chris Ding, Xiaofeng He, and Horst D. Simon (2005). "[http://crd.lbl.gov/~cding/papers/nmfSIAM1.pdf On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering]". Proc. SIAM Int'l Conf. Data Mining, pp:. 606-610~~, April 2005~~.</ref> that when the Frobenius norm is used as a divergence, NMF is equivalent to a relaxed form of [[K-means clustering]]: matrix factor '''W''' contains cluster centroids and '''H''' contains cluster membership indicators. This also justifies the use of NMF for data clustering. NMF extends beyond matrices to tensors of arbitrary order.<ref>Max Welling and Markus Weber (2001). "Positive Tensor Factorization", ''[[Pattern Recognition Letters]]'', 22(12), pp. 1255-1261.</ref> It is also known that NMF is an instance of so-called "multinomial PCA".<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002. </ref> ~~When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is also equivalent <REF>~~ Chris Ding, Tao Li, Wei Peng, "[http://crd.lbl.gov/~cding/papers/nmfpLSI.pdf Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method]", Proc. of AAAI National Conf. on Artificial Intelligence (AAAI-06), July 2006.</REF> to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>Eric Gaussier and Cyril Goutte, "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005. </ref> which has long been used for analyzing and clustering textual data. == Uniqueness ==

Non-negative matrix factorization: Difference between revisions