Content deleted Content added
m I added a use case of NMF in the context of bioinformatics |
Fix cites. |
||
Line 3:
[[File:NMF.png|thumb|400px|Illustration of approximate non-negative matrix factorization: the matrix {{math|'''V'''}} is represented by the two smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, which, when multiplied, approximately reconstruct {{math|'''V'''}}.]]
'''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite
NMF finds applications in such fields as [[astronomy]],<ref name="blantonRoweis07"/><ref name="ren18"/> [[computer vision]], [[document clustering]],<ref name="dhillon" /> [[Imputation (statistics)|missing data imputation]],<ref name="ren20">{{Cite journal|arxiv=2001.00563|last1= Ren|first1= Bin |title= Using Data Imputation for Signal Separation in High Contrast Imaging|journal= The Astrophysical Journal|volume= 892|issue= 2|pages= 74|last2= Pueyo|first2= Laurent|last3= Chen | first3 = Christine|last4= Choquet|first4= Elodie |last5= Debes|first5= John H|last6= Duechene |first6= Gaspard|last7= Menard|first7=Francois|last8=Perrin|first8=Marshall D.|year= 2020|doi= 10.3847/1538-4357/ab7024 | bibcode = 2020ApJ...892...74R |s2cid= 209531731}}</ref> [[chemometrics]], [[audio signal processing]], [[recommender system]]s,<ref name="gemulla">{{cite conference |author=Rainer Gemulla |author2=Erik Nijkamp |author3=Peter J. Haas|author3-link= Peter J. Haas (computer scientist)|author4=Yannis Sismanis |title=Large-scale matrix factorization with distributed stochastic gradient descent |conference=Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining |url=<!-- http://www.mpi-inf.mpg.de/~rgemulla/publications/rj10481rev.pdf --><!--removing dead link--> |year=2011 |pages=69–77 }}</ref><ref>{{cite conference |author=Yang Bao|title=TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation |conference=AAAI |url=http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8273 |year=2014 |display-authors=etal}}</ref> and [[bioinformatics]].<ref>{{cite journal |author=Ben Murrell|title=Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution |journal=PLOS ONE |volume=6 |issue=12 |year=2011 |pages=e28898|display-authors=etal|doi=10.1371/journal.pone.0028898 |pmid=22216138 |pmc=3245233 |bibcode=2011PLoSO...628898M |doi-access=free }}</ref>
Line 120:
=== Nonnegative rank factorization ===
In case the [[Nonnegative rank (linear algebra)|nonnegative rank]] of {{math|'''V'''}} is equal to its actual rank, {{math|1='''V''' = '''WH'''}} is called a nonnegative rank factorization (NRF).<ref name=BermanPlemmons74>{{cite journal|last=Berman|first=A.|author2=R.J. Plemmons |title=Inverses of nonnegative matrices|journal=Linear and Multilinear Algebra|year=1974|volume=2|issue=2|pages=161–172|doi=10.1080/03081087408817055}}</ref><ref name=BermanPlemmons94>{{cite book|author1=A. Berman |author2=R.J. Plemmons |title=Nonnegative matrices in the Mathematical Sciences|year=1994|publisher=SIAM|___location=Philadelphia}}</ref><ref name=Thomas74>{{cite journal |last=Thomas|first=L.B.|title=Problem 73-14, Rank factorization of nonnegative matrices|journal=SIAM Rev.|year=1974|volume=16|issue=3|pages=393–394|doi=10.1137/1016064}}</ref> The problem of finding the NRF of {{math|'''V'''}}, if it exists, is known to be NP-hard.<ref name=Vavasis09>{{cite journal|last=Vavasis|first=S.A.|title=On the complexity of nonnegative matrix factorization|journal=SIAM J. Optim.|year=2009|volume=20|issue=3|pages=1364–1377 |doi=10.1137/070709967|arxiv=0708.4149|s2cid=7150400}}</ref>
=== Different cost functions and regularizations ===
Line 152:
===Online NMF===
Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.<ref>http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf</ref><ref>{{cite book|url=http://dl.acm.org/citation.cfm?id=1339264.1339709|title=Online Discussion Participation Prediction Using Non-negative Matrix Factorization |first1=Yik-Hing|last1=Fung|first2=Chun-Hung|last2=Li|first3=William K.|last3=Cheung|date=2 November 2007|publisher=IEEE Computer Society|pages=284–287|via=dl.acm.org|isbn=9780769530284|series=Wi-Iatw '07}}</ref><ref>{{Cite journal |author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo|author4=Bo Yuan|name-list-style=amp|date=July 2012|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation|journal=IEEE Transactions on Neural Networks and Learning Systems |issue=7 |doi=10.1109/TNNLS.2012.2197827|pmid=24807135|volume=23|pages=1087–1099|s2cid=8755408}}</ref>
== Algorithms ==
Line 181:
| doi=10.1137/07069239x
| citeseerx = 10.1.1.70.3485
}}</ref> the optimal gradient method,<ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo
|author1 = Jingu Kim
|author2 = Haesun Park
Line 208:
Current algorithms are sub-optimal in that they only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be [[NP-complete]].<ref>{{Cite book
|
| author = Ding, C.
| author2 = He, X.
| author3 = Simon, H.D.
| name-list-style = amp
|
| volume = 4
| pages = 606–610
Line 229:
=== Exact NMF ===
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal|last=Campbell|first=S.L.|author2=G.D. Poole |title=Computing nonnegative rank factorizations |journal=Linear Algebra Appl.|year=1981|volume=35 |pages=175–182|doi=10.1016/0024-3795(81)90272-x|doi-access=free}}</ref> Kalofolias and Gallopoulos (2012)<ref name=KalofoliasGallopoulos2012>{{cite journal|last=Kalofolias|first=V. |author2=Gallopoulos, E. |title=Computing symmetric nonnegative rank factorizations|journal=Linear Algebra Appl|year=2012|volume=436 |issue=2|pages=421–435|doi=10.1016/j.laa.2011.03.016 |url=https://infoscience.epfl.ch/record/198764/files/main.pdf}}</ref> solved the symmetric counterpart of this problem, where {{math|'''V'''}} is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in {{math|O(rm<sup>2</sup>)}} time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition.<ref name=Arora2013>{{Cite conference
| last1 = Arora | first1 = Sanjeev
| last2 = Ge | first2 = Rong
Line 375:
=== Astronomy ===
In astronomy, NMF is a promising method for [[dimension reduction]] in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations <ref name="blantonRoweis07">{{Cite journal |arxiv=astro-ph/0606170|last1= Blanton|first1= Michael R.|title= K-corrections and filter transformations in the ultraviolet, optical, and near infrared |journal= The Astronomical Journal|volume= 133|issue= 2|pages= 734–754|last2= Roweis|first2= Sam |year= 2007|doi= 10.1086/510127|bibcode = 2007AJ....133..734B |s2cid= 18561804}}</ref> and the direct imaging observations <ref name = "ren18">{{Cite journal|arxiv=1712.10317|last1= Ren|first1= Bin |title= Non-negative Matrix Factorization: Robust Extraction of Extended Structures|journal= The Astrophysical Journal|volume= 852|issue= 2|pages= 104|last2= Pueyo|first2= Laurent|last3= Zhu | first3 = Guangtun B.|last4= Duchêne|first4= Gaspard |year= 2018|doi= 10.3847/1538-4357/aaa1f2|bibcode = 2018ApJ...852..104R |s2cid= 3966513}}</ref> as a method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007) <ref name="blantonRoweis07" /> takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016) <ref name="zhu16">{{Cite arXiv|last=Zhu|first=Guangtun B.|date=2016-12-19|title=Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data |eprint=1612.06037|class=astro-ph.IM}}</ref> where missing data are also considered and [[parallel computing]] is enabled. Their method is then adopted by Ren et al. (2018) <ref name="ren18" /> to the direct imaging field as one of the [[methods of detecting exoplanets]], especially for the direct imaging of [[circumstellar disks]].
Ren et al. (2018) <ref name="ren18" /> are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the [[linearity]] of the NMF modeling process; the [[linearity]] property is used to separate the stellar light and the light scattered from the [[exoplanets]] and [[circumstellar disks]].
In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted,<ref>{{Cite journal |arxiv=
=== Data imputation ===
Line 450:
=== Spectral data analysis ===
NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.<ref name="BerryM2006Algorithm">{{Cite journal
|first1=Michael W. |last1=Berry |first2=Murray |last2=Browne |first3=Amy N. |last3=Langville |first4=V. Paul |last4=Paucac |first5=Robert J. |last5=Plemmonsc
|journal=Computational Statistics & Data Analysis |volume=52 |issue=1 |date=15 September 2007 |pages=155–173
}}</ref>
=== Scalable Internet distance prediction ===
Line 553 ⟶ 554:
| year = 2006
| pages = 126–135
| doi =
}}</ref>, as been use for drug repurposing tasks in order to predict novel protein targets and therapeutic indications for approved drugs <ref>{{Cite journal
| last1 = Ceddia|last2 = Pinoli|last3 = Ceri|last4 = Masseroli
Line 570 ⟶ 571:
=== Nuclear imaging ===
NMF, also referred in this field as factor analysis, has been used since the 1980s<ref>{{Cite journal |last1=DiPaola|last2=Bazin|last3=Aubry|last4=Aurengo|last5=Cavailloles|last6=Herry|last7=Kahn|year=1982 |title=Handling of dynamic sequences in nuclear medicine|journal=[[IEEE Trans Nucl Sci]]|volume=
| last1 = Sitek
| last2 = Gullberg
|