Content deleted Content added
→Current research: Marked section as needing update |
Citations changed years to dates |
||
Line 5:
'''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite report|last1=Tandon|first1=Rashish|last2=Sra|first2=Suvrit |title=Sparse nonnegative matrix approximation: new formulations and algorithms|date=September 13, 2010 |url=https://is.tuebingen.mpg.de/fileadmin/user_upload/files/publications/MPIK-TR-193_%5B0%5D.pdf |id=Technical Report No. 193 |publisher=Max Planck Institute for Biological Cybernetics}}</ref> is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)|matrix]] {{math|'''V'''}} is [[Matrix decomposition|factorized]] into (usually) two matrices {{math|'''W'''}} and {{math|'''H'''}}, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
NMF finds applications in such fields as [[astronomy]],<ref name="blantonRoweis07"/><ref name="ren18"/> [[computer vision]], [[document clustering]],<ref name="dhillon" /> [[Imputation (statistics)|missing data imputation]],<ref name="ren20">{{Cite journal|arxiv=2001.00563|last1= Ren|first1= Bin |title= Using Data Imputation for Signal Separation in High Contrast Imaging|journal= The Astrophysical Journal|volume= 892|issue= 2|pages= 74|last2= Pueyo|first2= Laurent|last3= Chen | first3 = Christine|last4= Choquet|first4= Elodie |last5= Debes|first5= John H|last6= Duechene |first6= Gaspard|last7= Menard|first7=Francois|last8=Perrin|first8=Marshall D.|
== History ==
Line 18:
| issue = 3
| pages = 617–633
|
| doi=10.2307/1267173
| jstor = 1267173
Line 35:
| issue = 14
| pages = 1705–1718
|
| doi = 10.1016/1352-2310(94)00367-T
| bibcode = 1995AtmEn..29.1705A
Line 45:
| author2-link = Sebastian Seung
| name-list-style = amp
|
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature (journal)|Nature]]
Line 57:
}}</ref><ref name="lee2001algorithms">{{Cite conference
|author1=Daniel D. Lee |author2=H. Sebastian Seung
|name-list-style=amp |
| url = http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorization.pdf
| title = Algorithms for Non-negative Matrix Factorization
Line 100:
When the orthogonality constraint <math> \mathbf{H}\mathbf{H}^T = I </math> is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too.
When the error function to be used is [[Kullback–Leibler divergence]], NMF is identical to the [[probabilistic latent semantic analysis]] (PLSA), a popular document clustering method.<ref>{{cite journal |vauthors=Ding C, Li Y, Peng W |url=http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf |title=On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing |archive-url=https://web.archive.org/web/20160304070027/http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf |archive-date=2016-03-04 |url-status=dead |journal=Computational Statistics & Data Analysis |
== Types ==
Line 113:
=== Nonnegative rank factorization ===
In case the [[nonnegative rank (linear algebra)|nonnegative rank]] of {{math|'''V'''}} is equal to its actual rank, {{math|1='''V''' = '''WH'''}} is called a nonnegative rank factorization (NRF).<ref name=BermanPlemmons74>{{cite journal|last=Berman|first=A.|author2=R.J. Plemmons |title=Inverses of nonnegative matrices|journal=Linear and Multilinear Algebra|
=== Different cost functions and regularizations ===
Line 126:
: <math>F(\mathbf{W},\mathbf{H}) = \left\|\mathbf{V} - \mathbf{WH} \right\|^2_F</math>
Another type of NMF for images is based on the [[total variation norm]].<ref>{{Cite journal | last1 = Zhang | first1 = T. | last2 = Fang | first2 = B. | last3 = Liu | first3 = W. | last4 = Tang | first4 = Y. Y. | last5 = He | first5 = G. | last6 = Wen | first6 = J. | doi = 10.1016/j.neucom.2008.01.022 | title = Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns | journal = [[Neurocomputing (journal)|Neurocomputing]]| volume = 71 | issue = 10–12 | pages = 1824–1831|
When [[Tikhnov regularization|L1 regularization]] (akin to [[Lasso (statistics)|Lasso]]) is added to NMF with the mean squared error cost function, the resulting problem may be called '''non-negative sparse coding''' due to the similarity to the [[sparse coding]] problem,<ref name="hoyer02">{{cite conference |last=Hoyer |first=Patrik O. |title=Non-negative sparse coding |conference=Proc. IEEE Workshop on Neural Networks for Signal Processing |
|author1=Leo Taslaman |author2=Björn Nilsson
|name-list-style=amp | title = A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data
Line 134:
| volume = 7
| issue = 11
|
| pages = e46331
| doi = 10.1371/journal.pone.0046331
Line 142:
|doi-access=free
}}</ref>
although it may also still be referred to as NMF.<ref>{{Cite conference | last1 = Hsieh | first1 = C. J. | last2 = Dhillon | first2 = I. S. | doi = 10.1145/2020408.2020577 | title = Fast coordinate descent methods with variable selection for non-negative matrix factorization | conference = Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 | pages = 1064|
=== Online NMF ===
Line 162:
More recently other algorithms have been developed.
Some approaches are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math|'''H'''}} is fixed and {{math|'''W'''}} found by a non-negative least squares solver, then {{math|'''W'''}} is fixed and {{math|'''H'''}} is found analogously. The procedures used to solve for {{math|'''W'''}} and {{math|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math|'''W'''}} and {{math|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal | last1 = Lin | first1 = Chih-Jen| title = Projected Gradient Methods for Nonnegative Matrix Factorization | doi = 10.1162/neco.2007.19.10.2756 | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 19 | issue = 10 | pages = 2756–2779 |
| author = Hyunsoo Kim
| author2 = Haesun Park
Line 171:
| volume = 30
| issue = 2
|
| pages = 713–730
| url = http://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf
Line 196:
| volume = 33
| issue = 2
|
| pages = 285–319
| url =https://smallk.github.io/papers/nmf_review_jgo.pdf
Line 212:
| volume = 4
| pages = 606–610
|
| doi=10.1137/1.9781611972757.70
| isbn = 978-0-89871-593-4
Line 225:
=== Exact NMF ===
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal|last=Campbell|first=S.L.|author2=G.D. Poole |title=Computing nonnegative rank factorizations |journal=Linear Algebra Appl.|
| last1 = Arora | first1 = Sanjeev
| last2 = Ge | first2 = Rong
Line 238:
| arxiv = 1212.4777
| conference = Proceedings of the 30th International Conference on Machine Learning
|
| bibcode = 2012arXiv1212.4777A}}</ref>
Line 248:
| volume = 401
| issue = 6755
|
| doi = 10.1038/44565
| pmid = 10548103
Line 266:
| volume = 2430
| pages = 23–34
|
}}</ref>
When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>{{Cite conference
Line 272:
|author2 = Cyril Goutte
|name-list-style = amp
|
|url = http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf
|title = Relation between PLSA and NMF and Implications
Line 287:
NMF with the least-squares objective is equivalent to a relaxed form of [[K-means clustering]]: the matrix factor {{math|'''W'''}} contains cluster centroids and {{math|'''H'''}} contains cluster membership indicators.<ref name="DingSDM2005">C. Ding, X. He, H.D. Simon (2005). [http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"]. Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005</ref><ref>Ron Zass and [[Amnon Shashua]] (2005). "[http://www.cs.huji.ac.il/~zass/papers/cp-iccv05.pdf A Unifying Approach to Hard and Probabilistic Clustering]". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.</ref> This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF".{{r|ding}}
NMF can be seen as a two-layer [[Bayesian network|directed graphical]] model with one layer of observed random variables and one layer of hidden random variables.<ref>{{cite conference |author=Max Welling|title=Exponential Family Harmoniums with an Application to Information Retrieval |conference=NIPS|url=http://papers.nips.cc/paper/2672-exponential-family-harmoniums-with-an-application-to-information-retrieval |
NMF extends beyond matrices to tensors of arbitrary order.<ref>{{Cite journal
Line 297:
| issue = 4
| pages = 854–888
|
| doi = 10.2307/1390831
| jstor = 1390831
}}</ref><ref>{{Cite journal
|author1=Max Welling |author2=Markus Weber
|name-list-style=amp |
| title = Positive Tensor Factorization
| journal = [[Pattern Recognition Letters]]
Line 317:
| pages = 311–326
| url = http://www.cc.gatech.edu/~hpark/papers/2011_paper_hpscbook_ntf.pdf
|
| conference = High-Performance Scientific Computing: Algorithms and Applications }}
</ref> This extension may be viewed as a non-negative counterpart to, e.g., the [[PARAFAC]] model.
Line 329:
| url = http://books.nips.cc/papers/files/nips24/NIPS2011_1189.pdf
| conference = NIPS
|
}}
</ref>
Line 341:
| name-list-style = amp
| title = Efficient Multiplicative updates for Support Vector Machines
|
| conference = Proceedings of the 2009 SIAM Conference on Data Mining (SDM)
| pages = 1218–1229
Line 355:
| publisher = [[Association for Computing Machinery]]
| ___location = New York
|
| conference = Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval
| pages = 267–273
Line 366:
In this simple case it will just correspond to a scaling and a [[permutation]].
More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>{{Cite book |doi = 10.1109/IJCNN.2004.1381036|chapter = Sparse coding and NMF|title = 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541)|volume = 4|pages = 2529–2533|
== Applications ==
=== Astronomy ===
In astronomy, NMF is a promising method for [[dimension reduction]] in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations<ref name=":0">{{Cite journal |last1=Berné |first1=O. |last2=Joblin |first2=C.|author2-link=Christine Joblin |last3=Deville |first3=Y. |last4=Smith |first4=J. D. |last5=Rapacioli |first5=M. |last6=Bernard |first6=J. P. |last7=Thomas |first7=J. |last8=Reach |first8=W. |last9=Abergel |first9=A. |date=2007-07-01 |title=Analysis of the emission of very small dust particles from Spitzer spectro-imagery data using blind signal separation methods |url=https://www.aanda.org/articles/aa/abs/2007/26/aa6282-06/aa6282-06.html |journal=Astronomy & Astrophysics |language=en |volume=469 |issue=2 |pages=575–586 |doi=10.1051/0004-6361:20066282 |arxiv=astro-ph/0703072 |bibcode=2007A&A...469..575B |issn=0004-6361|doi-access=free }}</ref><ref name="blantonRoweis07">{{Cite journal |arxiv=astro-ph/0606170|last1= Blanton|first1= Michael R.|title= K-corrections and filter transformations in the ultraviolet, optical, and near infrared |journal= The Astronomical Journal|volume= 133|issue= 2|pages= 734–754|last2= Roweis|first2= Sam |
Ren et al. (2018)<ref name="ren18" /> are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the [[linearity]] of the NMF modeling process; the [[linearity]] property is used to separate the stellar light and the light scattered from the [[exoplanets]] and [[circumstellar disks]].
In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted,<ref>{{Cite journal |arxiv=0902.3247 |last1=Lafrenière|first1=David |title=HST/NICMOS Detection of HR 8799 b in 1998 |journal=The Astrophysical Journal Letters |volume=694|issue=2|pages=L148|last2=Maroid |first2= Christian|last3= Doyon |first3=René| last4=Barman|first4=Travis|
=== Data imputation ===
Line 402:
| issue = 3
| pages = 520–522
|
| doi = 10.1016/j.neuroimage.2005.04.034
| pmid = 15946864
Line 425:
| issue = 3
| pages = 249–264
|
| doi = 10.1007/s10588-005-5380-5
| first2 = Murray
Line 435:
| title = Clustering of scientific citations in Wikipedia
| conference = [[Wikimania]]
|
| url = http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=5666
| arxiv = 0805.1154
Line 462:
| issue = 12
| pages = 2273–2284
|
| doi = 10.1109/JSAC.2006.884026
|citeseerx=10.1.1.136.3837
Line 500:
| volume = 196
| issue = 4
|
| doi = 10.1534/genetics.113.160572
| pmid = 24496008
Line 514:
| volume = 4
| issue = 7
|
| doi=10.1371/journal.pcbi.1000029
| pmid = 18654623
Line 528:
| issue = 12
| pages = 1495–1502
|
| doi = 10.1093/bioinformatics/btm134
| pmid = 17483501
Line 538:
| volume = 125
| issue = 3
|
| pages = 359–371
| doi =10.1007/s00401-012-1077-2
Line 547:
A particular variant of NMF, namely Non-Negative Matrix Tri-Factorization (NMTF),<ref>{{Cite book
| last1 = Ding|last2 = Li|last3 = Peng|last4 = Park
| title=Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining | chapter=Orthogonal nonnegative matrix t-factorizations for clustering | date=
| pages = 126–135
| doi = 10.1145/1150402.1150420
Line 554:
| title = Matrix factorization-based technique for drug repurposing predictions
| journal = IEEE Journal of Biomedical and Health Informatics
|
|volume = 24|issue = 11| pages = 3162–3172
| doi = 10.1109/JBHI.2020.2991763
Line 561:
| title = Predicting drug synergism by means of non-negative matrix tri-factorization
| journal = IEEE/ACM Transactions on Computational Biology and Bioinformatics
|
|volume = PP| issue=4 | pages=1956–1967 | doi = 10.1109/TCBB.2021.3091814
|pmid = 34166199|s2cid = 235634059}}</ref>
=== Nuclear imaging ===
NMF, also referred in this field as factor analysis, has been used since the 1980s<ref>{{Cite journal |last1=DiPaola|last2=Bazin|last3=Aubry|last4=Aurengo|last5=Cavailloles|last6=Herry|last7=Kahn|
| last1 = Sitek
| last2 = Gullberg
Line 574:
| volume = 21
| issue = 3
|
| pages = 216–25
| doi=10.1109/42.996340
Line 590:
| volume = 35
| issue = 7
|
| pages = 1104–11
| doi=10.1038/jcbfm.2015.69
Line 605:
| volume = 34
| issue = 1
|
| pages = 216–18
| doi=10.1109/TMI.2014.2352033
Line 624:
| issue = 4
| pages = 1350–1362
|
| doi = 10.1016/j.patcog.2007.09.010
|bibcode=2008PatRe..41.1350B
Line 632:
|author1=Chao Liu |author2=Hung-chih Yang |author3=Jinliang Fan |author4=Li-Wei He |author5=Yi-Min Wang |name-list-style=amp | title = Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce
| journal = Proceedings of the 19th International World Wide Web Conference
|
| url = http://research.microsoft.com/pubs/119077/DNMF.pdf
}}</ref> Scalable Nonnegative Matrix Factorization (ScalableNMF),<ref>{{Cite journal
Line 641:
| title = Scalable Nonnegative Matrix Factorization with Block-wise Updates
| journal = Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
|
| url = http://rio.ecs.umass.edu/mnilpub/papers/ecmlpkdd2014-yin.pdf
}}</ref> Distributed Stochastic Singular Value Decomposition.<ref>{{Cite web|url=https://mahout.apache.org/|title=Apache Mahout|website=mahout.apache.org|access-date=2019-12-14}}</ref>
# Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC<ref>{{Cite journal |author1=Dong Wang |author2=Ravichander Vipperla |author3=Nick Evans |author4=Thomas Fang Zheng |title=Online Non-Negative Convolutive Pattern Learning for Speech Signals |journal=IEEE Transactions on Signal Processing |
# Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. multi-view clustering, see CoNMF<ref>{{Cite journal
| author = Xiangnan He
Line 653:
| title = Comment-based Multi-View Clustering of Web 2.0 Items
| journal = Proceedings of the 23rd International World Wide Web Conference
|
| url = http://www.comp.nus.edu.sg/~xiangnan/files/www2014-he.pdf
| access-date = 2015-03-22
Line 668:
| name-list-style = amp
| journal = Proceedings of SIAM Data Mining Conference
|
| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf
| doi=10.1137/1.9781611972832.28
Line 698:
| issue = 10
| pages = 2289–2298
|
| doi = 10.1016/0004-6981(89)90190-X
|bibcode=1989AtmEn..23.2289S | doi-access = free
Line 710:
| issue = 1
| pages = 23–35
|
| doi = 10.1016/S0169-7439(96)00044-5
}}
Line 719:
| volume = 19
| issue = 3
|
| pages = 780–791
| pmid = 17298233
Line 734:
| volume=51
| pages=7–18
|
| doi=10.1007/s11434-005-1109-6
| issue=17–18
Line 746:
| name-list-style = amp
| title = Descent Methods for Nonnegative Matrix Factorization
|
| eprint = 0801.3199
| class = cs.NA
Line 761:
| volume = 25
| issue = 1
|
| pages = 142–145
| doi = 10.1109/MSP.2008.4408452
Line 772:
| volume = 21
| issue = 3
|
| pmid=18785855
| doi=10.1162/neco.2008.04-08-771
Line 783:
| volume = 2009
| issue = 2
|
| doi = 10.1155/2009/785152
| pages = 1–17
|