Revision as of 09:59, 6 March 2007 edit Fnielsen (talk \| contribs) Extended confirmed users, Pending changes reviewers 4,961 edits fmt more exact citation ← Previous edit		Revision as of 12:41, 6 March 2007 edit undo SmackBot (talk \| contribs) 3,734,324 edits m Date/fix maintenance tags Next edit →
Line 6: Factorization of matrices is generally non-unique, and a number of different methods of doing so have been developed (e.g. [[principal component analysis]] and [[singular value decomposition]]) by incorporating different constraints; non-negative matrix factorization differs from these methods in that it enforces the constraint that all three matrices must be [[non-negative matrix\|non-negative]], i.e., all elements must be equal to or greater than zero. Usually the number of columns of ~~<b>~~'''W~~</b>~~''' and the number of rows of ~~<b>~~'''H~~</b>~~''' in NMF are selected so the product ~~<b>~~'''WH~~</b>~~''' will become an approximation to ~~<b>~~'''X~~</b>~~''' (it has been suggested that the NMF model should be called ''nonnegative matrix approximation'' instead). The full decomposition of '''X''' then amounts to the two non-negative matrices '''W''' and '''H''' as well as a residual '''U''': : <math>\mathbf{X} = \mathbf{WH + U} </math> Line 12: == History == Early work research on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal \| author = P. Paatero, U. Tapper \| title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values Line 29: \| year = 1995 \| doi = 10.1016/1352-2310(94)00367-T }}</ref>. It became more widely known after Lee and Seung's investigations of the properties of the algorithm, and after they published a simple useful algorithm. == Types == There are different types of non-negative matrix factorizations and one of these is related to [[probabilistic latent semantic analysis]] and the [[latent class model]]. The different types arise from using different [[cost function]]s (divergence functions) and/or by [[regularization (mathematics)\|regularization]] of the '''W''' and/or '''H''' matrices.<ref>[[Inderjit S. Dhillon]], [[Suvrit Sra]], "[http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf Generalized Nonnegative Matrix Approximations with Bregman Divergences]", [[NIPS]], 2005.</ref>. == Relation to Data Clustering == Although initially NMF is considered to be different from vector quantization ([[K-means clustering]]){{~~fact~~Fact\|date=March 2007}}, it was later shown <ref> ~~<ref>~~ Chris Ding, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering". Proc. SIAM Int'l Conf. Data Mining (SDM'05), pp:606-610, April 2005.</ref> that NMF is equivalent to the relaxed [[K-means clustering]] using the Frobenius norm objective function, matrix factor '''W''' contains cluster centroids and '''H''' contains cluster membership indicators; therefore NMF provides a framework for data clustering. It is also known that NMF is an instance of so-called "multinomial PCA".<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002. </ref> When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is also equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>Eric Gaussier and Cyril Goutte, "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005. </ref> which has long been used for analyzing and clustering textual data.▼ ~~<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002. </ref>~~ ~~When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is also equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],~~ ▲<ref>Eric Gaussier and Cyril Goutte, "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005. </ref> which has long been used for analyzing and clustering textual data. == Uniqueness == Line 52 ⟶ 49: If the two new matrices <math>\mathbf{\tilde{W} = WB}</math> and <math>\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}</math> are [[non-negative matrix\|non-negative]] they form another parametrization of the factorization. The non-negativity of <math>\mathbf{\tilde{W}}</math> and <math>\mathbf{\tilde{H}}</math> applies at least if ~~<b>~~'''B~~</b>~~''' is a non-negative [[monomial matrix]]. In this simple case it will just correspond to a scaling and a [[permutation]]. More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>Julian Eggert, Edgar Körner, "[http://dx.doi.org/10.1109/IJCNN.2004.1381036 Sparse coding and NMF]", ''Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004., pp. 2529-2533, 2004.</ref>. == Sources and external links == Line 66 ⟶ 63: === References === <div class="references-small"><references/></div> [[Category:Linear algebra]]

Non-negative matrix factorization: Difference between revisions