The initial paper by Lee & Seung proposed NMF mainly for parts-based decomposition of images. It compares NMF to [[vector quantization]] and [[principal component analysis]], and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.
It was later shown that some types of NMF isare an instance of a more general probabilistic model called "multinomial PCA".<ref>Wray Buntine, "[http://cosco.hiit.fi/Articles/ecml02.pdf Extensions to EM and Multinomial PCA]", Proc. European Conference on Machine Learning (ECML-02), LNAI 2430, pp. 23-34, 2002.</ref> When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>Eric Gaussier and Cyril Goutte (2005). "[http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf Relation between PLSA and NMF and Implications]", ''Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05)'', pp. 601-602.</ref><ref>Chris Ding, Tao Li, Wei Peng (2006). "[http://crd.lbl.gov/~cding/papers/nmfpLSI.pdf Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method]", ''Proc. of AAAI National Conf. on Artificial Intelligence (AAAI-06)''.</ref> trained by [[maximum likelihood]] estimation. That method is commonly used for analyzing and clustering textual data and is also related to the [[latent class model]].
It was also shown<ref>Chris Ding, Xiaofeng He, and Horst D. Simon (2005). "[http://crd.lbl.gov/~cding/papers/nmfSIAM1.pdf On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering]". Proc. SIAM Int'l Conf. Data Mining, pp. 606-610.</ref> that when the Frobenius norm is used as a divergence, NMF