Non-negative matrix factorization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 06:40, 20 February 2023 edit Citation bot (talk \| contribs) Bots 5,864,630 edits Alter: url. URLs might have been anonymized. Add: pages, issue, s2cid, authors 1-1. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Corvus florensis \| #UCB_webform 661/3499 ← Previous edit		Latest revision as of 00:47, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,864,630 edits Add: article-number, bibcode. Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 914/990
(24 intermediate revisions by 14 users not shown)
Line 5: '''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite report\|last1=Tandon\|first1=Rashish\|last2=Sra\|first2=Suvrit \|title=Sparse nonnegative matrix approximation: new formulations and algorithms\|date=September 13, 2010 \|url=https://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/MPIK-TR-193_%5B0%5D.pdf \|id=Technical Report No. 193 \|publisher=Max Planck Institute for Biological Cybernetics}}</ref> is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)\|matrix]] {{math\|'''V'''}} is [[Matrix decomposition\|factorized]] into (usually) two matrices {{math\|'''W'''}} and {{math\|'''H'''}}, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as [[astronomy]],<ref name=":0">{{Cite journal \|last1=Berné \|first1=O. \|last2=Joblin \|first2=C. \|last3=Deville \|first3=Y. \|last4=Smith \|first4=J. D. \|last5=Rapacioli \|first5=M. \|last6=Bernard \|first6=J. P. \|last7=Thomas \|first7=J. \|last8=Reach \|first8=W. \|last9=Abergel \|first9=A. \|date=2007-07-01 \|title=Analysis of the emission of very small dust particles from Spitzer spectro-imagery data using blind signal separation methods \|url=https://www.aanda.org/articles/aa/abs/2007/26/aa6282-06/aa6282-06.html \|journal=Astronomy & Astrophysics \|language=en \|volume=469 \|issue=2 \|pages=575–586 \|doi=10.1051/0004-6361:20066282 \|issn=0004-6361\|doi-access=free }}</ref><ref name="blantonRoweis07"/><ref name="ren18"/> [[computer vision]], [[document clustering]],<ref name="dhillon" /> [[Imputation (statistics)\|missing data imputation]],<ref name="ren20">{{Cite journal\|arxiv=2001.00563\|last1= Ren\|first1= Bin \|title= Using Data Imputation for Signal Separation in High Contrast Imaging\|journal= The Astrophysical Journal\|volume= 892\|issue= 2\|pages= 74\|last2= Pueyo\|first2= Laurent\|last3= Chen \| first3 = Christine\|last4= Choquet\|first4= Elodie \|last5= Debes\|first5= John H\|last6= Duechene \|first6= Gaspard\|last7= Menard\|first7=Francois\|last8=Perrin\|first8=Marshall D.\|~~year~~date= 2020\|doi= 10.3847/1538-4357/ab7024 \| bibcode = 2020ApJ...892...74R \|s2cid= 209531731\|doi-access= free}}</ref> [[chemometrics]], [[audio signal processing]], [[recommender system\|recommender systems]],<ref name="gemulla">{{cite conference \|author=Rainer Gemulla \|author2=Erik Nijkamp \|author3=Peter J. Haas\|author3-link= Peter J. Haas (computer scientist)\|author4=Yannis Sismanis \|title=Large-scale matrix factorization with distributed stochastic gradient descent \|conference=Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining \|url=<!-- http://www.mpi-inf.mpg.de/~rgemulla/publications/rj10481rev.pdf --><!--removing dead link--> \|~~year~~date=2011 \|pages=69–77 }}</ref><ref>{{cite conference \|author=Yang Bao\|title=TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation \|conference=AAAI \|url=http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8273 \|~~year~~date=2014 \|display-authors=etal}}</ref> and [[bioinformatics]].<ref>{{cite journal \|author=Ben Murrell\|title=Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution \|journal=PLOS ONE \|volume=6 \|issue=12 \|~~year~~date=2011 \|pages=e28898\|display-authors=etal\|doi=10.1371/journal.pone.0028898 \|pmid=22216138 \|pmc=3245233 \|bibcode=2011PLoSO...628898M \|doi-access=free }}</ref> == History == Line 18: \| issue = 3 \| pages = 617–633 \| ~~year~~ date= 1971 \| doi=10.2307/1267173 \| jstor = 1267173 Line 35: \| issue = 14 \| pages = 1705–1718 \| ~~year~~ date= 1995 \| doi = 10.1016/1352-2310(94)00367-T \| bibcode = 1995AtmEn..29.1705A Line 45: \| author2-link = Sebastian Seung \| name-list-style = amp \| ~~year~~ date= 1999 \| title = Learning the parts of objects by non-negative matrix factorization \| journal = [[Nature (journal)\|Nature]] Line 57: }}</ref><ref name="lee2001algorithms">{{Cite conference \|author1=Daniel D. Lee \|author2=H. Sebastian Seung \|name-list-style=amp \| ~~year~~ date= 2001 \| url = http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf \| title = Algorithms for Non-negative Matrix Factorization Line 98: Furthermore, the computed <math>H</math> gives the cluster membership, i.e., if <math>\mathbf{H}_{kj} > \mathbf{H}_{ij} </math> for all ''i'' ≠ ''k'', this suggests that the input data <math> v_j </math> belongs to <math>k</math>-th cluster. The computed <math>W</math> gives the cluster centroids, i.e., the <math>k</math>-th column gives the cluster centroid of <math>k</math>-th cluster. This centroid's representation can be significantly enhanced by convex NMF. When the orthogonality constraint <math> \mathbf{H}\mathbf{H}^T = I </math> is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. ~~Clustering is the main objective of most [[data mining]] applications of NMF.{{citation needed\|date=April 2015}}~~ When the error function to be used is [[Kullback–Leibler divergence]], NMF is identical to the [[probabilistic latent semantic analysis]] (PLSA), a popular document clustering method.<ref>{{cite journal \|vauthors=Ding C, Li Y, Peng W \|url=http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf \|title=On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing \|archive-url=https://web.archive.org/web/20160304070027/http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf \|archive-date=2016-03-04 \|url-status=dead \|journal=Computational Statistics & Data Analysis \|~~year~~date=2008 \|volume=52 \|issue=8 \|pages=3913–3927\|doi=10.1016/j.csda.2008.01.011 }}</ref> == Types == Line 107: Usually the number of columns of {{math\|'''W'''}} and the number of rows of {{math\|'''H'''}} in NMF are selected so the product {{math\|'''WH'''}} will become an approximation to {{math\|'''V'''}}. The full decomposition of {{math\|'''V'''}} then amounts to the two non-negative matrices {{math\|'''W'''}} and {{math\|'''H'''}} as well as a residual {{math\|'''U'''}}, such that: {{math\|1='''V''' = '''WH''' + '''U'''}}. The elements of the residual matrix can either be negative or positive. When {{math\|'''W'''}} and {{math\|'''H'''}} are smaller than {{math\|'''V'''}} they become easier to store and manipulate. Another reason for factorizing {{math\|'''V'''}} into smaller matrices {{math\|'''W'''}} and {{math\|'''H'''}}, is that if one's isgoal ~~able~~is to approximately represent the elements of {{math\|'''V'''}} by significantly less data, then one has to infer some latent structure in the data. === Convex non-negative matrix factorization === Line 113: === Nonnegative rank factorization === In case the [[nonnegative rank (linear algebra)\|nonnegative rank]] of {{math\|'''V'''}} is equal to its actual rank, {{math\|1='''V''' = '''WH'''}} is called a nonnegative rank factorization (NRF).<ref name=BermanPlemmons74>{{cite journal\|last=Berman\|first=A.\|author2=R.J. Plemmons \|title=Inverses of nonnegative matrices\|journal=Linear and Multilinear Algebra\|~~year~~date=1974\|volume=2\|issue=2\|pages=161–172\|doi=10.1080/03081087408817055}}</ref><ref name=BermanPlemmons94>{{cite book\|author1=A. Berman \|author2=R.J. Plemmons \|title=Nonnegative matrices in the Mathematical Sciences\|~~year~~date=1994\|publisher=SIAM\|___location=Philadelphia}}</ref><ref name=Thomas74>{{cite journal \|last=Thomas\|first=L.B.\|title=Problem 73-14, Rank factorization of nonnegative matrices\|journal=SIAM Rev.\|~~year~~date=1974\|volume=16\|issue=3\|pages=393–394\|doi=10.1137/1016064}}</ref> The problem of finding the NRF of {{math\|'''V'''}}, if it exists, is known to be NP-hard.<ref name=Vavasis09>{{cite journal\|last=Vavasis\|first=S.A.\|title=On the complexity of nonnegative matrix factorization\|journal=SIAM J. Optim.\|~~year~~date=2009\|volume=20\|issue=3\|pages=1364–1377 \|doi=10.1137/070709967\|arxiv=0708.4149\|s2cid=7150400}}</ref> === Different cost functions and regularizations === There are different types of non-negative matrix factorizations. The different types arise from using different [[Loss function\|cost function]]s for measuring the divergence between {{math\|'''V'''}} and {{math\|'''WH'''}} and possibly by [[regularization (mathematics)\|regularization]] of the {{math\|'''W'''}} and/or {{math\|'''H'''}} matrices.<ref name="dhillon">{{~~Cite~~cite ~~Q \| Q77685465 }}</ref>~~conference \| last1 = Dhillon \| first1 = Inderjit S. \| last2 = Sra \| first2 = Suvrit \| contribution = Generalized Nonnegative Matrix Approximations with Bregman Divergences \| contribution-url = https://proceedings.neurips.cc/paper/2005/hash/d58e2f077670f4de9cd7963c857f2534-Abstract.html \| pages = 283–290 \| title = Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada] \| year = ~~2006~~2005}}</ref>▼ Two simple divergence functions studied by Lee and Seung are the squared error (or [[Frobenius norm]]) and an extension of the Kullback–Leibler divergence to positive matrices (the original [[Kullback–Leibler divergence]] is defined on probability distributions). Line 126 ⟶ 133: : <math>F(\mathbf{W},\mathbf{H}) = \left\\|\mathbf{V} - \mathbf{WH} \right\\|^2_F</math> Another type of NMF for images is based on the [[total variation norm]].<ref>{{Cite journal \| last1 = Zhang \| first1 = T. \| last2 = Fang \| first2 = B. \| last3 = Liu \| first3 = W. \| last4 = Tang \| first4 = Y. Y. \| last5 = He \| first5 = G. \| last6 = Wen \| first6 = J. \| doi = 10.1016/j.neucom.2008.01.022 \| title = Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns \| journal = [[Neurocomputing (journal)\|Neurocomputing]]\| volume = 71 \| issue = 10–12 \| pages = 1824–1831\| ~~year~~date = 2008 }}</ref> When [[Tikhnov regularization\|L1 regularization]] (akin to [[Lasso (statistics)\|Lasso]]) is added to NMF with the mean squared error cost function, the resulting problem may be called '''non-negative sparse coding''' due to the similarity to the [[sparse coding]] problem,<ref name="hoyer02">{{cite conference \|last=Hoyer \|first=Patrik O. \|title=Non-negative sparse coding \|conference=Proc. IEEE Workshop on Neural Networks for Signal Processing \|~~year~~date=2002 \|arxiv=cs/0202009 }}</ref><ref name="Leo Taslaman and Björn Nilsson 2012 e46331">{{Cite journal \|author1=Leo Taslaman \|author2=Björn Nilsson \|name-list-style=amp \| title = A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data Line 134 ⟶ 141: \| volume = 7 \| issue = 11 \| ~~year~~ date= 2012 \| pages = e46331 \| doi = 10.1371/journal.pone.0046331 Line 142 ⟶ 149: \|doi-access=free }}</ref> although it may also still be referred to as NMF.<ref>{{Cite conference \| last1 = Hsieh \| first1 = C. J. \| last2 = Dhillon \| first2 = I. S. \| doi = 10.1145/2020408.2020577 \| title = Fast coordinate descent methods with variable selection for non-negative matrix factorization \| conference = Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 \| pages = 1064\| ~~year~~ date= 2011 \| isbn = 9781450308137 \| url = http://www.cs.utexas.edu/~cjhsieh/nmf_kdd11.pdf}}</ref> === Online NMF === Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream\|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommender system\|recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.~~<ref>http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf {{Bare URL PDF\|date=March 2022}}</ref>~~<ref>{{cite book\|url=http://dl.acm.org/citation.cfm?id=1339264.1339709\|title=Online Discussion Participation Prediction Using Non-negative Matrix Factorization \|first1=Yik-Hing\|last1=Fung\|first2=Chun-Hung\|last2=Li\|first3=William K.\|last3=Cheung\|date=2 November 2007\|publisher=IEEE Computer Society\|pages=284–287\|via=dl.acm.org\|isbn=9780769530284\|series=Wi-Iatw '07}}</ref><ref>{{Cite journal \|author=Naiyang Guan\|author2=Dacheng Tao\|author3=Zhigang Luo\|author4=Bo Yuan\|name-list-style=amp\|date=July 2012\|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation\|journal=IEEE Transactions on Neural Networks and Learning Systems \|issue=7 \|doi=10.1109/TNNLS.2012.2197827\|pmid=24807135\|volume=23\|pages=1087–1099\|bibcode=2012ITNNL..23.1087G \|s2cid=8755408}}</ref> === Convolutional NMF === If the columns of {{math\|'''V'''}} represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t. shifts along these dimensions can be learned by Convolutional NMF. In this case, {{math\|'''W'''}} is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of {{math\|'''V'''}}, representing [[Kernel (image processing)\|convolution kernels]]. By spatio-temporal pooling of {{math\|'''H'''}} and repeatedly using the resulting representation as input to convolutional NMF, deep feature hierarchies can be learned.<ref>{{Cite ~~journal~~book \|last=Behnke \|first=S. \|~~date~~title=Proceedings of the International Joint Conference on Neural Networks, 2003 \|~~title~~chapter=Discovering hierarchical speech features using convolutional non-negative matrix factorization \|~~url~~date=~~https://ieeexplore.ieee.org/document/1224004 \|journal=Proceedings of the International Joint Conference on Neural Networks,~~ 2003. \|___location=Portland, Oregon USA \|publisher=IEEE \|volume=4 \|pages=2758–2763 \|doi=10.1109/IJCNN.2003.1224004 \|isbn=978-0-7803-7898-8\|s2cid=3109867 }}</ref> == Algorithms == Line 162 ⟶ 169: More recently other algorithms have been developed. Some approaches are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math\|'''H'''}} is fixed and {{math\|'''W'''}} found by a non-negative least squares solver, then {{math\|'''W'''}} is fixed and {{math\|'''H'''}} is found analogously. The procedures used to solve for {{math\|'''W'''}} and {{math\|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math\|'''W'''}} and {{math\|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal \| last1 = Lin \| first1 = Chih-Jen\| title = Projected Gradient Methods for Nonnegative Matrix Factorization \| doi = 10.1162/neco.2007.19.10.2756 \| journal = [[Neural Computation (journal)\|Neural Computation]]\| volume = 19 \| issue = 10 \| pages = 2756–2779 \| ~~year~~ date= 2007 \| pmid = 17716011\| url = http://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf\| citeseerx = 10.1.1.308.9135\| s2cid = 2295736}}</ref><ref>{{Cite journal \| last1 = Lin \| first1 = Chih-Jen\| doi = 10.1109/TNN.2007.895831 \| title = On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization \| journal = IEEE Transactions on Neural Networks\| volume = 18 \| issue = 6 \| pages = 1589–1596 \| ~~year~~ date= 2007 \| bibcode = 2007ITNN...18.1589L\| citeseerx = 10.1.1.407.318\| s2cid = 2183630}}</ref> the [[active set]] method,<ref name="gemulla"/><ref name="kim2008nonnegative">{{Cite journal \| author = Hyunsoo Kim \| author2 = Haesun Park Line 171 ⟶ 178: \| volume = 30 \| issue = 2 \| ~~year~~ date= 2008 \| pages = 713–730 \| url = http://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf Line 188 ⟶ 195: \|url = <!-- http://www.cc.gatech.edu/~jingu/docs/2011_paper_sisc_nmf.pdf --><!-- removing dead link --> \|doi = 10.1137/110821172 \|bibcode = 2011SJSC...33.3261K \|citeseerx = 10.1.1.419.798 }}</ref> among several others.<ref name="kim2013unified">{{Cite journal Line 195 ⟶ 203: \| volume = 33 \| issue = 2 \| ~~year~~ date= 2013 \| pages = 285–319 \| url =https://smallk.github.io/papers/nmf_review_jgo.pdf Line 211 ⟶ 219: \| volume = 4 \| pages = 606–610 \| ~~year~~ date= 2005 \| doi=10.1137/1.9781611972757.70 \| isbn = 978-0-89871-593-4 }}</ref> However, as in many other data mining applications, a local minimum may still prove to be useful. In addition to the optimization step, initialization has a significant effect on NMF. The initial values chosen for {{math\|'''W'''}} and {{math\|'''H'''}} may affect not only the rate of convergence, but also the overall error at convergence. Some options for initialization include complete randomization, [[Singular value decomposition\|SVD]], k-means clustering, and more advanced strategies based on these and other paradigms.<ref>{{Cite journal \|last1=Hafshejani \|first1=Sajad Fathi \|last2=Moaberfard \|first2=Zahra \|date=November 2022 \|title=Initialization for Nonnegative Matrix Factorization: a Comprehensive Review \|journal=International Journal of Data Science and Analytics \|volume=16 \|issue=1 \|pages=119–134 \|doi=10.1007/s41060-022-00370-9 \|issn=2364-415X\|arxiv=2109.03874 }}</ref> [[File:Fractional_Residual_Variances_comparison,_PCA_and_NMF.pdf\|thumb\|500px\|Fractional residual variance (FRV) plots for PCA and sequential NMF;<ref name="ren18"/> for PCA, the theoretical values are the contribution from the residual eigenvalues. In comparison, the FRV curves for PCA reaches a flat plateau where no signal are captured effectively; while the NMF FRV curves are declining continuously, indicating a better ability to capture signal. The FRV curves for NMF also converges to higher levels than PCA, indicating the less-overfitting property of NMF.]] Line 224 ⟶ 234: === Exact NMF === Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math\|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math\|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal\|last=Campbell\|first=S.L.\|author2=G.D. Poole \|title=Computing nonnegative rank factorizations \|journal=Linear Algebra Appl.\|~~year~~date=1981\|volume=35 \|pages=175–182\|doi=10.1016/0024-3795(81)90272-x\|doi-access=free}}</ref> Kalofolias and Gallopoulos (2012)<ref name=KalofoliasGallopoulos2012>{{cite journal\|last=Kalofolias\|first=V. \|author2=Gallopoulos, E. \|title=Computing symmetric nonnegative rank factorizations\|journal=Linear Algebra Appl\|~~year~~date=2012\|volume=436 \|issue=2\|pages=421–435\|doi=10.1016/j.laa.2011.03.016 \|url=https://infoscience.epfl.ch/record/198764/files/main.pdf}}</ref> solved the symmetric counterpart of this problem, where {{math\|'''V'''}} is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in {{math\|O(rm<sup>2</sup>)}} time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition.<ref name=Arora2013>{{Cite conference \| last1 = Arora \| first1 = Sanjeev \| last2 = Ge \| first2 = Rong Line 237 ⟶ 247: \| arxiv = 1212.4777 \| conference = Proceedings of the 30th International Conference on Machine Learning \| ~~year~~ date=2013 \| bibcode = 2012arXiv1212.4777A}}</ref> Line 247 ⟶ 257: \| volume = 401 \| issue = 6755 \| ~~year~~ date= 1999 \| doi = 10.1038/44565 \| pmid = 10548103 Line 265 ⟶ 275: \| volume = 2430 \| pages = 23–34 \| ~~year~~ date= 2002 }}</ref> When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>{{Cite conference Line 271 ⟶ 281: \|author2 = Cyril Goutte \|name-list-style = amp \|~~year~~date = 2005 \|url = http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf \|title = Relation between PLSA and NMF and Implications Line 286 ⟶ 296: NMF with the least-squares objective is equivalent to a relaxed form of [[K-means clustering]]: the matrix factor {{math\|'''W'''}} contains cluster centroids and {{math\|'''H'''}} contains cluster membership indicators.<ref name="DingSDM2005">C. Ding, X. He, H.D. Simon (2005). [http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"]. Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005</ref><ref>Ron Zass and [[Amnon Shashua]] (2005). "[http://www.cs.huji.ac.il/~zass/papers/cp-iccv05.pdf A Unifying Approach to Hard and Probabilistic Clustering]". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.</ref> This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF".{{r\|ding}} NMF can be seen as a two-layer [[Bayesian network\|directed graphical]] model with one layer of observed random variables and one layer of hidden random variables.<ref>{{cite conference \|author=Max Welling\|title=Exponential Family Harmoniums with an Application to Information Retrieval \|conference=NIPS\|url=http://papers.nips.cc/paper/2672-exponential-family-harmoniums-with-an-application-to-information-retrieval \|~~year~~date=2004\|display-authors=etal}}</ref> NMF extends beyond matrices to tensors of arbitrary order.<ref>{{Cite journal Line 296 ⟶ 306: \| issue = 4 \| pages = 854–888 \| ~~year~~ date= 1999 \| doi = 10.2307/1390831 \| jstor = 1390831 }}</ref><ref>{{Cite journal \|author1=Max Welling \|author2=Markus Weber \|name-list-style=amp \| ~~year~~ date= 2001 \| title = Positive Tensor Factorization \| journal = [[Pattern Recognition Letters]] Line 316 ⟶ 326: \| pages = 311–326 \| url = http://www.cc.gatech.edu/~hpark/papers/2011_paper_hpscbook_ntf.pdf \| ~~year~~ date= 2012 \| conference = High-Performance Scientific Computing: Algorithms and Applications }} </ref> This extension may be viewed as a non-negative counterpart to, e.g., the [[PARAFAC]] model. Line 328 ⟶ 338: \| url = http://books.nips.cc/papers/files/nips24/NIPS2011_1189.pdf \| conference = NIPS \| ~~year~~ date=2011 }} </ref> NMF is an instance of nonnegative [[quadratic programming]] ~~([[NQP]])~~, just like the [[support vector machine]] (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.<ref>{{Cite conference \| author = Vamsi K. Potluru \| author2 = Sergey M. Plis Line 340 ⟶ 350: \| name-list-style = amp \| title = Efficient Multiplicative updates for Support Vector Machines \| ~~year~~ date= 2009 \| conference = Proceedings of the 2009 SIAM Conference on Data Mining (SDM) \| pages = 1218–1229 Line 354 ⟶ 364: \| publisher = [[Association for Computing Machinery]] \| ___location = New York \| ~~year~~ date= 2003 \| conference = Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval \| pages = 267–273 Line 365 ⟶ 375: In this simple case it will just correspond to a scaling and a [[permutation]]. More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>{{Cite book \|doi = 10.1109/IJCNN.2004.1381036\|chapter = Sparse coding and NMF\|title = 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)\|volume = 4\|pages = 2529–2533\|~~year~~ date= 2004\|last1 = Eggert\|first1 = J.\|last2 = Korner\|first2 = E.\|isbn = 978-0-7803-8359-3\|s2cid = 17923083}}</ref> == Applications == === Astronomy === In astronomy, NMF is a promising method for [[dimension reduction]] in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations<ref name=":0">{{Cite journal \|last1=Berné \|first1=O. \|last2=Joblin \|first2=C.\|author2-link=Christine Joblin \|last3=Deville \|first3=Y. \|last4=Smith \|first4=J. D. \|last5=Rapacioli \|first5=M. \|last6=Bernard \|first6=J. P. \|last7=Thomas \|first7=J. \|last8=Reach \|first8=W. \|last9=Abergel \|first9=A. \|date=2007-07-01 \|title=Analysis of the emission of very small dust particles from Spitzer spectro-imagery data using blind signal separation methods \|url=https://www.aanda.org/articles/aa/abs/2007/26/aa6282-06/aa6282-06.html \|journal=Astronomy & Astrophysics \|language=en \|volume=469 \|issue=2 \|pages=575–586 \|doi=10.1051/0004-6361:20066282 \|arxiv=astro-ph/0703072 \|bibcode=2007A&A...469..575B \|issn=0004-6361\|doi-access=free }}</ref><ref name="blantonRoweis07">{{Cite journal \|arxiv=astro-ph/0606170\|last1= Blanton\|first1= Michael R.\|title= K-corrections and filter transformations in the ultraviolet, optical, and near infrared \|journal= The Astronomical Journal\|volume= 133\|issue= 2\|pages= 734–754\|last2= Roweis\|first2= Sam \|~~year~~date= 2007\|doi= 10.1086/510127\|bibcode = 2007AJ....133..734B \|s2cid= 18561804}}</ref> and the direct imaging observations<ref name = "ren18">{{Cite journal\|arxiv=1712.10317\|last1= Ren\|first1= Bin \|title= Non-negative Matrix Factorization: Robust Extraction of Extended Structures\|journal= The Astrophysical Journal\|volume= 852\|issue= 2\|pages= 104\|last2= Pueyo\|first2= Laurent\|last3= Zhu \| first3 = Guangtun B.\|last4= Duchêne\|first4= Gaspard \|~~year~~date= 2018\|doi= 10.3847/1538-4357/aaa1f2\|bibcode = 2018ApJ...852..104R \|s2cid= 3966513\|doi-access= free}}</ref> as a method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007)<ref name="blantonRoweis07" /> takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016)<ref name="zhu16">{{Cite arXiv\|last=Zhu\|first=Guangtun B.\|date=2016-12-19\|title=Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data \|eprint=1612.06037\|class=astro-ph.IM}}</ref> where missing data are also considered and [[parallel computing]] is enabled. Their method is then adopted by Ren et al. (2018)<ref name="ren18" /> to the direct imaging field as one of the [[methods of detecting exoplanets]], especially for the direct imaging of [[circumstellar disks]]. Ren et al. (2018)<ref name="ren18" /> are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the [[linearity]] of the NMF modeling process; the [[linearity]] property is used to separate the stellar light and the light scattered from the [[exoplanets]] and [[circumstellar disks]]. In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted,<ref>{{Cite journal \|arxiv=0902.3247 \|last1=Lafrenière\|first1=David \|title=HST/NICMOS Detection of HR 8799 b in 1998 \|journal=The Astrophysical Journal Letters \|volume=694\|issue=2\|pages=L148\|last2=Maroid \|first2= Christian\|last3= Doyon \|first3=René\| last4=Barman\|first4=Travis\|~~year~~date=2009\|doi=10.1088/0004-637X/694/2/L148\|bibcode=2009ApJ...694L.148L \|s2cid=7332750}}</ref><ref>{{Cite journal\|arxiv=1207.6637 \|last1= Amara\|first1= Adam \|title= PYNPOINT: an image processing package for finding exoplanets\|journal= Monthly Notices of the Royal Astronomical Society\|volume= 427\|issue= 2\|pages= 948\|last2= Quanz\|first2= Sascha P.\|~~year~~date= 2012\|doi= 10.1111/j.1365-2966.2012.21918.x\|doi-access= free\|bibcode = 2012MNRAS.427..948A\|s2cid= 119200505}}</ref><ref name = "soummer12">{{Cite journal\|arxiv=1207.4197\|last1= Soummer\|first1= Rémi \|title= Detection and Characterization of Exoplanets and Disks Using Projections on Karhunen-Loève Eigenimages\|journal= The Astrophysical Journal Letters \|volume= 755\|issue= 2\|pages= L28\|last2= Pueyo\|first2= Laurent\|last3= Larkin \|first3=James\|~~year~~date=2012\|doi=10.1088/2041-8205/755/2/L28\|bibcode=2012ApJ...755L..28S\|s2cid=51088743}}</ref> however the light from the exoplanets or circumstellar disks are usually over-fitted, where forward modeling have to be adopted to recover the true flux.<ref>{{Cite journal\|arxiv= 1502.03092 \|last1= Wahhaj \|first1= Zahed \|title=Improving signal-to-noise in the direct imaging of exoplanets and circumstellar disks with MLOCI \|last2=Cieza\|first2=Lucas A.\|last3=Mawet\|first3=Dimitri\|last4=Yang\|first4=Bin\|last5=Canovas \|first5=Hector\|last6=de Boer\|first6=Jozua\|last7=Casassus \|first7=Simon\|last8= Ménard\|first8= François \|last9=Schreiber\|first9=Matthias R.\|last10=Liu\|first10=Michael C.\|last11=Biller\|first11=Beth A. \|last12=Nielsen\|first12=Eric L.\|last13=Hayward\|first13=Thomas L.\|journal= Astronomy & Astrophysics\|volume= 581\|issue= 24\|pages= A24\|~~year~~date= 2015\|doi= 10.1051/0004-6361/201525837\|bibcode = 2015A&A...581A..24W\|s2cid= 20174209}}</ref><ref name="pueyo16">{{Cite journal\|arxiv= 1604.06097 \|last1= Pueyo\|first1= Laurent \|title= Detection and Characterization of Exoplanets using Projections on Karhunen Loeve Eigenimages: Forward Modeling \|journal= The Astrophysical Journal \|volume= 824\|issue= 2\|pages= 117\|~~year~~date= 2016\|doi= 10.3847/0004-637X/824/2/117 \|bibcode = 2016ApJ...824..117P\|s2cid= 118349503\|doi-access= free}}</ref> Forward modeling is currently optimized for point sources,<ref name="pueyo16"/> however not for extended sources, especially for irregularly shaped structures such as circumstellar disks. In this situation, NMF has been an excellent method, being less over-fitting in the sense of the non-negativity and [[sparsity]] of the NMF modeling coefficients, therefore forward modeling can be performed with a few scaling factors,<ref name="ren18" /> rather than a computationally intensive data re-reduction on generated models. === Data imputation === Line 401 ⟶ 411: \| issue = 3 \| pages = 520–522 \| ~~year~~ date= 2005 \| doi = 10.1016/j.neuroimage.2005.04.034 \| pmid = 15946864 Line 424 ⟶ 434: \| issue = 3 \| pages = 249–264 \| ~~year~~ date= 2005 \| doi = 10.1007/s10588-005-5380-5 \| first2 = Murray Line 431 ⟶ 441: NMF has also been applied to citations data, with one example clustering [[English Wikipedia]] articles and [[scientific journal]]s based on the outbound scientific citations in English Wikipedia.<ref>{{Cite conference \| last1 = Nielsen \| ~~first~~first1 = Finn Årup \| title = Clustering of scientific citations in Wikipedia \| conference = [[Wikimania]] \| ~~year~~ date= 2008 \| url = http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=5666 \| arxiv = 0805.1154 Line 461 ⟶ 471: \| issue = 12 \| pages = 2273–2284 \| ~~year~~ date= 2006 \| doi = 10.1109/JSAC.2006.884026 \|bibcode=2006IJSAC..24.2273M \|citeseerx=10.1.1.136.3837 \|s2cid=12931155 }}</ref> Afterwards, as a fully decentralized approach, Phoenix network coordinate system<ref name="Phoenix_Chen11">{{Cite journal Line 479 ⟶ 490: \|doi = 10.1109/tnsm.2011.110911.100079 \|display-authors = etal \|bibcode = 2011ITNSM...8..334C \|url-status = dead \|archive-url = https://web.archive.org/web/20111114191220/http://www.cs.duke.edu/~ychen/Phoenix_TNSM.pdf Line 499 ⟶ 511: \| volume = 196 \| issue = 4 \| ~~year~~ date= 2014 \| doi = 10.1534/genetics.113.160572 \| pmid = 24496008 Line 513 ⟶ 525: \| volume = 4 \| issue = 7 \| ~~year~~ date= 2008 \| doi=10.1371/journal.pcbi.1000029 \| pmid = 18654623 Line 519 ⟶ 531: \| pages=e1000029 \| bibcode = 2008PLSCB...4E0029D \| doi-access = free }}</ref><ref name="kim2007sparse">{{Cite journal \|author1=Hyunsoo Kim \|author2=Haesun Park Line 526 ⟶ 539: \| issue = 12 \| pages = 1495–1502 \| ~~year~~ date= 2007 \| doi = 10.1093/bioinformatics/btm134 \| pmid = 17483501 Line 536 ⟶ 549: \| volume = 125 \| issue = 3 \| ~~year~~ date= 2013 \| pages = 359–371 \| doi =10.1007/s00401-012-1077-2 Line 543 ⟶ 556: }}</ref> In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes.<ref>{{Cite journal\|last1=Alexandrov\|first1=Ludmil B.\|last2=Nik-Zainal\|first2=Serena\|last3=Wedge\|first3=David C.\|last4=Campbell\|first4=Peter J.\|last5=Stratton\|first5=Michael R.\|date=2013-01-31\|title=Deciphering signatures of mutational processes operative in human cancer\|journal=Cell Reports\|volume=3\|issue=1\|pages=246–259\|doi=10.1016/j.celrep.2012.12.008\|issn=2211-1247\|pmc=3588146\|pmid=23318258}}</ref> NMF techniques can identify sources of variation such as cell types, disease subtypes, population stratification, tissue composition, and tumor clonality.<ref>{{Cite journal\|last1=Stein-O’Brien\|first1=Genevieve L.\|last2=Arora\|first2=Raman\|last3=Culhane\|first3=Aedin C.\|last4=Favorov\|first4=Alexander V.\|last5=Garmire\|first5=Lana X.\|last6=Greene\|first6=Casey S.\|last7=Goff\|first7=Loyal A.\|last8=Li\|first8=Yifeng\|last9=Ngom\|first9=Aloune\|last10=Ochs\|first10=Michael F.\|last11=Xu\|first11=Yanxun\|date=2018-10-01\|title=Enter the Matrix: Factorization Uncovers Knowledge from Omics\|url= \|journal=Trends in Genetics\|language=en\|volume=34\|issue=10\|pages=790–805\|doi=10.1016/j.tig.2018.07.003\|issn=0168-9525\|pmid=30143323\|pmc=6309559}}</ref> A particular variant of NMF, namely Non-Negative Matrix Tri-Factorization (NMTF),<ref>{{Cite ~~journal~~book \| last1 = Ding\|last2 = Li\|last3 = Peng\|last4 = Park \| title =Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining \| chapter=Orthogonal nonnegative matrix t-factorizations for clustering \| date=2006 \| journal = Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining▼ ▲ \| year = 2006 \| pages = 126–135 \| doi = 10.1145/1150402.1150420 \|isbn = 1595933395\|s2cid = 165018}}</ref> has been use for [[drug repositioning\|drug repurposing]] tasks in order to predict novel protein targets and therapeutic indications for approved drugs<ref>{{Cite journal \| last1 = Ceddia\|last2 = Pinoli\|last3 = Ceri\|last4 = Masseroli \| title = Matrix factorization-based technique for drug repurposing predictions \| journal = IEEE Journal of Biomedical and Health Informatics \| ~~year~~ date= 2020 \|volume = 24\|issue = 11\| pages = 3162–3172 \| doi = 10.1109/JBHI.2020.2991763 \|pmid = 32365039\| bibcode=2020IJBHI..24.3162C \|s2cid = 218504587\|hdl = 11311/1144602\|hdl-access = free}}</ref> and to infer pair of synergic anticancer drugs.<ref>{{Cite journal \| last1 = Pinoli\|last2 = Ceddia\|last3 = Ceri\|last4 = Masseroli \| title = Predicting drug synergism by means of non-negative matrix tri-factorization \| journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics \| ~~year~~ date= 2021 \|volume = PP\| issue=4 \| pages=1956–1967 \| doi = 10.1109/TCBB.2021.3091814 \|pmid = 34166199\|s2cid = 235634059}}</ref> === Nuclear imaging === NMF, also referred in this field as factor analysis, has been used since the 1980s<ref>{{Cite journal \|last1=DiPaola\|last2=Bazin\|last3=Aubry\|last4=Aurengo\|last5=Cavailloles\|last6=Herry\|last7=Kahn\|~~year~~date=1982 \|title=Handling of dynamic sequences in nuclear medicine\|journal=[[IEEE Trans Nucl Sci]]\|volume=29\|issue=4 \|pages=1310–21\|bibcode=1982ITNS...29.1310D\|doi=10.1109/tns.1982.4332188\|s2cid=37186516}}</ref> to analyze sequences of images in [[SPECT]] and [[Positron emission tomography\|PET]] dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.<ref>{{Cite journal \| last1 = Sitek \| last2 = Gullberg Line 574 ⟶ 585: \| volume = 21 \| issue = 3 \| ~~year~~ date= 2002 \| pages = 216–25 \| doi=10.1109/42.996340 \| pmid = 11989846 \| bibcode = 2002ITMI...21..216S \| s2cid = 6553527 }}</ref> Line 590 ⟶ 602: \| volume = 35 \| issue = 7 \| ~~year~~ date= 2015 \| pages = 1104–11 \| doi=10.1038/jcbfm.2015.69 Line 605 ⟶ 617: \| volume = 34 \| issue = 1 \| ~~year~~ date= 2015 \| pages = 216–18 \| doi=10.1109/TMI.2014.2352033 \| pmid = 25167546 \| bibcode = 2015ITMI...34..216A \| s2cid = 11060831 \| url = https://escholarship.org/uc/item/0b95c190 }}</ref> == Current research == {{update section\|date=February 2024}} Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to, Line 623 ⟶ 637: \| issue = 4 \| pages = 1350–1362 \| ~~year~~ date= 2008 \| doi = 10.1016/j.patcog.2007.09.010 \|bibcode=2008PatRe..41.1350B Line 631 ⟶ 645: \|author1=Chao Liu \|author2=Hung-chih Yang \|author3=Jinliang Fan \|author4=Li-Wei He \|author5=Yi-Min Wang \|name-list-style=amp \| title = Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce \| journal = Proceedings of the 19th International World Wide Web Conference \| ~~year~~ date= 2010 \| url = http://research.microsoft.com/pubs/119077/DNMF.pdf }}</ref> Scalable Nonnegative Matrix Factorization (ScalableNMF),<ref>{{Cite journal Line 640 ⟶ 654: \| title = Scalable Nonnegative Matrix Factorization with Block-wise Updates \| journal = Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases \| ~~year~~ date= 2014 \| url = http://rio.ecs.umass.edu/mnilpub/papers/ecmlpkdd2014-yin.pdf }}</ref> Distributed Stochastic Singular Value Decomposition.<ref>{{Cite web\|url=https://mahout.apache.org/\|title=Apache Mahout\|website=mahout.apache.org\|access-date=2019-12-14}}</ref> # Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC<ref>{{Cite journal \|author1=Dong Wang \|author2=Ravichander Vipperla \|author3=Nick Evans \|author4=Thomas Fang Zheng \|title=Online Non-Negative Convolutive Pattern Learning for Speech Signals \|journal=IEEE Transactions on Signal Processing \|~~year~~date=2013 \|url=http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/pdf/cnsc-tsp.pdf \|doi=10.1109/tsp.2012.2222381 \|volume=61 \|issue=1 \|pages=44–56 \|bibcode=2013ITSP...61...44W \|citeseerx=10.1.1.707.7348 \|s2cid=12530378 \|access-date=2015-04-19 \|archive-url=https://web.archive.org/web/20150419072552/http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/pdf/cnsc-tsp.pdf \|archive-date=2015-04-19 \|url-status=dead }}</ref> # Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. multi-view clustering, see CoNMF<ref>{{Cite journal \| author = Xiangnan He Line 652 ⟶ 666: \| title = Comment-based Multi-View Clustering of Web 2.0 Items \| journal = Proceedings of the 23rd International World Wide Web Conference \| ~~year~~ date= 2014 \| url = http://www.comp.nus.edu.sg/~xiangnan/files/www2014-he.pdf \| access-date = 2015-03-22 Line 663 ⟶ 677: \| author3 = Jing Gao \| author4 = Jiawei Han ▲ \| ~~journal~~title = Proceedings of the ~~12th~~2013 ~~ACM SIGKDD~~SIAM International Conference on ~~Knowledge Discovery and~~ Data Mining \| ~~title~~chapter = Multi-View Clustering via Joint Nonnegative Matrix Factorization▼ \| name-list-style = amp \| ~~year~~ date= 2013▼ ▲ \| title = Multi-View Clustering via Joint Nonnegative Matrix Factorization ~~\| journal = Proceedings of SIAM Data Mining Conference~~ ▲ \| year = 2013 \| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf \| doi=10.1137/1.9781611972832.28 Line 696 ⟶ 710: \| issue = 10 \| pages = 2289–2298 \| ~~year~~ date= 1989 \| doi = 10.1016/0004-6981(89)90190-X \|bibcode=1989AtmEn..23.2289S }}\| doi-access = free }} * {{Cite journal \| author = Pentti Paatero Line 707 ⟶ 722: \| issue = 1 \| pages = 23–35 \| ~~year~~ date= 1997 \| doi = 10.1016/S0169-7439(96)00044-5 }} Line 716 ⟶ 731: \| volume = 19 \| issue = 3 \| ~~year~~ date= 2007 \| pages = 780–791 \| pmid = 17298233 Line 731 ⟶ 746: \| volume=51 \| pages=7–18 \| ~~year~~date=2006 \| doi=10.1007/s11434-005-1109-6 \| issue=17–18 Line 743 ⟶ 758: \| name-list-style = amp \| title = Descent Methods for Nonnegative Matrix Factorization \| ~~year~~ date= 2008 \| eprint = 0801.3199 \| class = cs.NA Line 758 ⟶ 773: \| volume = 25 \| issue = 1 \| ~~year~~ date= 2008 \| pages = 142–145 \| doi = 10.1109/MSP.2008.4408452 Line 769 ⟶ 784: \| volume = 21 \| issue = 3 \| ~~year~~ date= 2009 \| pmid=18785855 \| doi=10.1162/neco.2008.04-08-771 Line 780 ⟶ 795: \| volume = 2009 \| issue = 2 \| ~~year~~ date= 2009 \| doi = 10.1155/2009/785152 \| pages = 1–17 \| article-number = 785152 \| pmid = 19536273 \| pmc = 2688815 Line 795 ⟶ 811: * Jen-Tzung Chien: "Source Separation and Machine Learning", Academic Press, {{ISBN\|978-0128177969}} (2018). * Shoji Makino(Ed.): "Audio Source Separation", Springer, {{ISBN\|978-3030103033}} (2019). * Nicolas Gillis: "Nonnegative Matrix Factorization", SIAM, {{ISBN \|978-1-611976-40-3}} (2020). {{refend}}