Non-negative matrix factorization

NMF redirects here. For the bridge convention, see new minor forcing.

Non-negative matrix factorization (NMF) is a group of algorithms in multivariate analysis and linear algebra where a matrix, $\mathbf {X}$ , is factorized into (usually) two matrices, $\mathbf {W}$ and $\mathbf {H}$

\operatorname {nmf} (\mathbf {X} )\rightarrow \mathbf {WH}

Factorization of matrices is generally non-unique, and a number of different methods of doing so have been developed (e.g. principal component analysis and singular value decomposition) by incorporating different constraints; non-negative matrix factorization differs from these methods in that it enforces the constraint that all three matrices must be non-negative, i.e., all elements must be equal to or greater than zero.

Usually the number of columns of W and the number of rows of H in NMF are selected so the product WH will become an approximation to X (it has been suggested that the NMF model should be called nonnegative matrix approximation instead). The full decomposition of X then amounts to the two non-negative matrices W and H as well as a residual U:

\mathbf {X} =\mathbf {WH+U}

The elements of the residual matrix can either be negative and positive - at least in the typical application of NMF.

Early work research on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name positive matrix factorization. It became more widely known after Lee and Seung's investigations of the properties of the algorithm, and after they published a simple useful algorithm.

There are different types of non-negative matrix factorizations and one of these is related to probabilistic latent semantic analysis and the latent class model. The different types arise from using different cost functions (divergence functions) and/or by regularization of the W and/or H matrices^[1].

Relation to Data Clustering

Although initially NMF is considered to be different from vector quantization (K-means clustering), it was later shown ^[2] that NMF is equivalent to the relaxed K-means clustering using the Frobenius norm objective function, matrix factor W contains cluster centroids and H contains cluster membership indicators; therefore NMF provides a framework for data clustering.

It is also known that NMF is an instance of so-called "multinomial PCA". ^[3] When NMF is obtained by minimizing the Kullback–Leibler divergence, it is also equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis, ^[4] which has long been used for analyzing and clustering textual data.

Uniqueness

The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g.,

\mathbf {WH} =\mathbf {WBB} ^{-1}\mathbf {H}

If the two new matrices $\mathbf {{\tilde {W}}=WB}$ and $\mathbf {\tilde {H}} =\mathbf {B} ^{-1}\mathbf {H}$ are non-negative they form another parametrization of the factorization.

The non-negativity of $\mathbf {\tilde {W}}$ and $\mathbf {\tilde {H}}$ applies at least if B is a non-negative monomial matrix. In this simple case it will just correspond to a scaling and a permutation.

More control over the non-uniqueness of NMF is obtained with sparsity constraints^[5].

Sources and external links

J. Shen, G. W. Israël, "A receptor model using a specific non-negative transformation technique for ambient aerosol", Atmospheric Environment, 23(10):2289-2298, 1989.
P. Paatero, U. Tapper, "Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values", Environmetrics, 5:111-126, 1994.
Pia Anttila, Pentti Paatero, Unto Tapper, Olli Järvinen. "Source identification of bulk wet deposition in Finland by positive matrix factorization", Atmospheric Environment, 29(14):1705-1718, 1995
Pentti Paatero, "Least squares formulation of robust non-negative factor analysis", Chemometrics and Intelligent Laboratory Systems, 37(1):23-35, 1997 May.
Daniel D. Lee and H. Sebastian Seung, "Learning the parts of objects by non-negative matrix factorization", Nature, 401(6755):788-791, 1999 October.
Daniel D. Lee and H. Sebastian Seung, "Algorithms for Non-negative Matrix Factorization", Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference, 556-562, MIT Press, 2001.

References

^ Inderjit S. Dhillon, Suvrit Sra, "Generalized Nonnegative Matrix Approximations with Bregman Divergences", NIPS, 2005.
^ Chris Ding, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering". Proc. SIAM Int'l Conf. Data Mining (SDM'05), pp:606-610, April 2005.
^ Wray Buntine, "Variational Extensions to EM and Multinomial PCA", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002.
^ Eric Gaussier and Cyril Goutte, "Relation between PLSA and NMF and Implications", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005.
^ Julian Eggert, Edgar Körner, "Sparse coding and NMF", Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004., pp. 2529-2533, 2004.

[1] Inderjit S. Dhillon, Suvrit Sra, "Generalized Nonnegative Matrix Approximations with Bregman Divergences", NIPS, 2005.

[2] Chris Ding, Xiaofeng He, and Horst D. Simon. "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering". Proc. SIAM Int'l Conf. Data Mining (SDM'05), pp:606-610, April 2005.

[3] Wray Buntine, "Variational Extensions to EM and Multinomial PCA", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002.

[4] Eric Gaussier and Cyril Goutte, "Relation between PLSA and NMF and Implications", Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05), pp. 601-602, 2005.

[5] Julian Eggert, Edgar Körner, "Sparse coding and NMF", Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004., pp. 2529-2533, 2004.

[1]

[2]

[3]

[4]

[5]