Content deleted Content added
fmt more exact citation |
m Date/fix maintenance tags |
||
Line 6:
Factorization of matrices is generally non-unique, and a number of different methods of doing so have been developed (e.g. [[principal component analysis]] and [[singular value decomposition]]) by incorporating different constraints; non-negative matrix factorization differs from these methods in that it enforces the constraint that all three matrices must be [[non-negative matrix|non-negative]], i.e., all elements must be equal to or greater than zero.
Usually the number of columns of
The full decomposition of '''X''' then amounts to the two non-negative matrices '''W''' and '''H''' as well as a residual '''U''':
: <math>\mathbf{X} = \mathbf{WH + U} </math>
Line 12:
== History ==
Early work research on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal
| author = P. Paatero, U. Tapper
| title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values
Line 29:
| year = 1995
| doi = 10.1016/1352-2310(94)00367-T
}}</ref>
It became more widely known after Lee and Seung's investigations of the properties of the algorithm, and after they published a simple useful algorithm.
== Types ==
There are different types of non-negative matrix factorizations and one of these is related to [[probabilistic latent semantic analysis]] and the [[latent class model]].
The different types arise from using different [[cost function]]s (divergence functions) and/or by [[regularization (mathematics)|regularization]] of the '''W''' and/or '''H''' matrices.<ref>[[Inderjit S. Dhillon]], [[Suvrit Sra]], "[http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf Generalized Nonnegative Matrix Approximations with Bregman Divergences]", [[NIPS]], 2005.</ref>
== Relation to Data Clustering ==
Although initially NMF is considered to be different from vector quantization ([[K-means clustering]]){{
Chris Ding, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering". Proc. SIAM Int'l Conf. Data Mining (SDM'05), pp:606-610, April 2005.</ref>
that NMF is equivalent to the relaxed [[K-means clustering]] using the Frobenius norm objective function, matrix factor '''W''' contains cluster centroids and '''H''' contains cluster membership indicators; therefore NMF provides a framework for data clustering.
It is also known that NMF is an instance of so-called "multinomial PCA".<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002. </ref>
When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is also equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>Eric Gaussier and Cyril Goutte, "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005. </ref> which has long been used for analyzing and clustering textual data.▼
▲<ref>Eric Gaussier and Cyril Goutte, "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005. </ref> which has long been used for analyzing and clustering textual data.
== Uniqueness ==
Line 52 ⟶ 49:
If the two new matrices <math>\mathbf{\tilde{W} = WB}</math> and <math>\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}</math> are [[non-negative matrix|non-negative]] they form another parametrization of the factorization.
The non-negativity of <math>\mathbf{\tilde{W}}</math> and <math>\mathbf{\tilde{H}}</math> applies at least if
In this simple case it will just correspond to a scaling and a [[permutation]].
More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>Julian Eggert, Edgar Körner, "[http://dx.doi.org/10.1109/IJCNN.2004.1381036 Sparse coding and NMF]", ''Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004., pp. 2529-2533, 2004.</ref>
== Sources and external links ==
Line 66 ⟶ 63:
=== References ===
<div class="references-small"><references/></div>
[[Category:Linear algebra]]
|