Non-negative matrix factorization: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:06, 25 February 2024 edit AnomieBOT (talk \| contribs) Bots 6,857,007 edits m Dating maintenance tags: {{Update section}} ← Previous edit		Latest revision as of 00:47, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,864,630 edits Add: article-number, bibcode. Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 914/990
(8 intermediate revisions by 5 users not shown)
Line 117: === Different cost functions and regularizations === There are different types of non-negative matrix factorizations. The different types arise from using different [[Loss function\|cost function]]s for measuring the divergence between {{math\|'''V'''}} and {{math\|'''WH'''}} and possibly by [[regularization (mathematics)\|regularization]] of the {{math\|'''W'''}} and/or {{math\|'''H'''}} matrices.<ref name="dhillon">{{~~Cite~~cite ~~Q \| Q77685465 }}</ref>~~conference \| last1 = Dhillon \| first1 = Inderjit S. \| last2 = Sra \| first2 = Suvrit \| contribution = Generalized Nonnegative Matrix Approximations with Bregman Divergences \| contribution-url = https://proceedings.neurips.cc/paper/2005/hash/d58e2f077670f4de9cd7963c857f2534-Abstract.html \| pages = 283–290 \| title = Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8, 2005, Vancouver, British Columbia, Canada] \| year = 2005}}</ref> Two simple divergence functions studied by Lee and Seung are the squared error (or [[Frobenius norm]]) and an extension of the Kullback–Leibler divergence to positive matrices (the original [[Kullback–Leibler divergence]] is defined on probability distributions). Line 145 ⟶ 152: === Online NMF === Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream\|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommender system\|recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.<ref>{{cite book\|url=http://dl.acm.org/citation.cfm?id=1339264.1339709\|title=Online Discussion Participation Prediction Using Non-negative Matrix Factorization \|first1=Yik-Hing\|last1=Fung\|first2=Chun-Hung\|last2=Li\|first3=William K.\|last3=Cheung\|date=2 November 2007\|publisher=IEEE Computer Society\|pages=284–287\|via=dl.acm.org\|isbn=9780769530284\|series=Wi-Iatw '07}}</ref><ref>{{Cite journal \|author=Naiyang Guan\|author2=Dacheng Tao\|author3=Zhigang Luo\|author4=Bo Yuan\|name-list-style=amp\|date=July 2012\|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation\|journal=IEEE Transactions on Neural Networks and Learning Systems \|issue=7 \|doi=10.1109/TNNLS.2012.2197827\|pmid=24807135\|volume=23\|pages=1087–1099\|bibcode=2012ITNNL..23.1087G \|s2cid=8755408}}</ref> === Convolutional NMF === If the columns of {{math\|'''V'''}} represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t. shifts along these dimensions can be learned by Convolutional NMF. In this case, {{math\|'''W'''}} is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of {{math\|'''V'''}}, representing [[Kernel (image processing)\|convolution kernels]]. By spatio-temporal pooling of {{math\|'''H'''}} and repeatedly using the resulting representation as input to convolutional NMF, deep feature hierarchies can be learned.<ref>{{Cite book \|last=Behnke \|first=S. \|title=Proceedings of the International Joint Conference on Neural Networks, 2003 \|chapter=Discovering hierarchical speech features using convolutional non-negative matrix factorization \|date=2003 ~~\|chapter-url=https://ieeexplore.ieee.org/document/1224004~~ \|___location=Portland, Oregon USA \|publisher=IEEE \|volume=4 \|pages=2758–2763 \|doi=10.1109/IJCNN.2003.1224004 \|isbn=978-0-7803-7898-8\|s2cid=3109867 }}</ref> == Algorithms == Line 162 ⟶ 169: More recently other algorithms have been developed. Some approaches are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math\|'''H'''}} is fixed and {{math\|'''W'''}} found by a non-negative least squares solver, then {{math\|'''W'''}} is fixed and {{math\|'''H'''}} is found analogously. The procedures used to solve for {{math\|'''W'''}} and {{math\|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math\|'''W'''}} and {{math\|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal \| last1 = Lin \| first1 = Chih-Jen\| title = Projected Gradient Methods for Nonnegative Matrix Factorization \| doi = 10.1162/neco.2007.19.10.2756 \| journal = [[Neural Computation (journal)\|Neural Computation]]\| volume = 19 \| issue = 10 \| pages = 2756–2779 \| date= 2007 \| pmid = 17716011\| url = http://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf\| citeseerx = 10.1.1.308.9135\| s2cid = 2295736}}</ref><ref>{{Cite journal \| last1 = Lin \| first1 = Chih-Jen\| doi = 10.1109/TNN.2007.895831 \| title = On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization \| journal = IEEE Transactions on Neural Networks\| volume = 18 \| issue = 6 \| pages = 1589–1596 \| date= 2007 \| bibcode = 2007ITNN...18.1589L\| citeseerx = 10.1.1.407.318\| s2cid = 2183630}}</ref> the [[active set]] method,<ref name="gemulla"/><ref name="kim2008nonnegative">{{Cite journal \| author = Hyunsoo Kim \| author2 = Haesun Park Line 216 ⟶ 223: \| isbn = 978-0-89871-593-4 }}</ref> However, as in many other data mining applications, a local minimum may still prove to be useful. In addition to the optimization step, initialization has a significant effect on NMF. The initial values chosen for {{math\|'''W'''}} and {{math\|'''H'''}} may affect not only the rate of convergence, but also the overall error at convergence. Some options for initialization include complete randomization, [[Singular value decomposition\|SVD]], k-means clustering, and more advanced strategies based on these and other paradigms.<ref>{{Cite journal \|last1=Hafshejani \|first1=Sajad Fathi \|last2=Moaberfard \|first2=Zahra \|date=November 2022 \|title=Initialization for Nonnegative Matrix Factorization: a Comprehensive Review \|journal=International Journal of Data Science and Analytics \|volume=16 \|issue=1 \|pages=119–134 \|doi=10.1007/s41060-022-00370-9 \|issn=2364-415X\|arxiv=2109.03874 }}</ref> [[File:Fractional_Residual_Variances_comparison,_PCA_and_NMF.pdf\|thumb\|500px\|Fractional residual variance (FRV) plots for PCA and sequential NMF;<ref name="ren18"/> for PCA, the theoretical values are the contribution from the residual eigenvalues. In comparison, the FRV curves for PCA reaches a flat plateau where no signal are captured effectively; while the NMF FRV curves are declining continuously, indicating a better ability to capture signal. The FRV curves for NMF also converges to higher levels than PCA, indicating the less-overfitting property of NMF.]] Line 375 ⟶ 384: Ren et al. (2018)<ref name="ren18" /> are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the [[linearity]] of the NMF modeling process; the [[linearity]] property is used to separate the stellar light and the light scattered from the [[exoplanets]] and [[circumstellar disks]]. In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted,<ref>{{Cite journal \|arxiv=0902.3247 \|last1=Lafrenière\|first1=David \|title=HST/NICMOS Detection of HR 8799 b in 1998 \|journal=The Astrophysical Journal Letters \|volume=694\|issue=2\|pages=L148\|last2=Maroid \|first2= Christian\|last3= Doyon \|first3=René\| last4=Barman\|first4=Travis\|date=2009\|doi=10.1088/0004-637X/694/2/L148\|bibcode=2009ApJ...694L.148L \|s2cid=7332750}}</ref><ref>{{Cite journal\|arxiv=1207.6637 \|last1= Amara\|first1= Adam \|title= PYNPOINT: an image processing package for finding exoplanets\|journal= Monthly Notices of the Royal Astronomical Society\|volume= 427\|issue= 2\|pages= 948\|last2= Quanz\|first2= Sascha P.\|date= 2012\|doi= 10.1111/j.1365-2966.2012.21918.x\|doi-access= free\|bibcode = 2012MNRAS.427..948A\|s2cid= 119200505}}</ref><ref name = "soummer12">{{Cite journal\|arxiv=1207.4197\|last1= Soummer\|first1= Rémi \|title= Detection and Characterization of Exoplanets and Disks Using Projections on Karhunen-Loève Eigenimages\|journal= The Astrophysical Journal Letters \|volume= 755\|issue= 2\|pages= L28\|last2= Pueyo\|first2= Laurent\|last3= Larkin \|first3=James\|date=2012\|doi=10.1088/2041-8205/755/2/L28\|bibcode=2012ApJ...755L..28S\|s2cid=51088743}}</ref> however the light from the exoplanets or circumstellar disks are usually over-fitted, where forward modeling have to be adopted to recover the true flux.<ref>{{Cite journal\|arxiv= 1502.03092 \|last1= Wahhaj \|first1= Zahed \|title=Improving signal-to-noise in the direct imaging of exoplanets and circumstellar disks with MLOCI \|last2=Cieza\|first2=Lucas A.\|last3=Mawet\|first3=Dimitri\|last4=Yang\|first4=Bin\|last5=Canovas \|first5=Hector\|last6=de Boer\|first6=Jozua\|last7=Casassus \|first7=Simon\|last8= Ménard\|first8= François \|last9=Schreiber\|first9=Matthias R.\|last10=Liu\|first10=Michael C.\|last11=Biller\|first11=Beth A. \|last12=Nielsen\|first12=Eric L.\|last13=Hayward\|first13=Thomas L.\|journal= Astronomy & Astrophysics\|volume= 581\|issue= 24\|pages= A24\|date=2015\|doi= 10.1051/0004-6361/201525837\|bibcode = 2015A&A...581A..24W\|s2cid= 20174209}}</ref><ref name="pueyo16">{{Cite journal\|arxiv= 1604.06097 \|last1= Pueyo\|first1= Laurent \|title= Detection and Characterization of Exoplanets using Projections on Karhunen Loeve Eigenimages: Forward Modeling \|journal= The Astrophysical Journal \|volume= 824\|issue= 2\|pages= 117\|date= 2016\|doi= 10.3847/0004-637X/824/2/117 \|bibcode = 2016ApJ...824..117P\|s2cid= 118349503\|doi-access= free}}</ref> Forward modeling is currently optimized for point sources,<ref name="pueyo16"/> however not for extended sources, especially for irregularly shaped structures such as circumstellar disks. In this situation, NMF has been an excellent method, being less over-fitting in the sense of the non-negativity and [[sparsity]] of the NMF modeling coefficients, therefore forward modeling can be performed with a few scaling factors,<ref name="ren18" /> rather than a computationally intensive data re-reduction on generated models. === Data imputation === Line 464 ⟶ 473: \| date= 2006 \| doi = 10.1109/JSAC.2006.884026 \|bibcode=2006IJSAC..24.2273M \|citeseerx=10.1.1.136.3837 \|s2cid=12931155 }}</ref> Afterwards, as a fully decentralized approach, Phoenix network coordinate system<ref name="Phoenix_Chen11">{{Cite journal Line 480 ⟶ 490: \|doi = 10.1109/tnsm.2011.110911.100079 \|display-authors = etal \|bibcode = 2011ITNSM...8..334C \|url-status = dead \|archive-url = https://web.archive.org/web/20111114191220/http://www.cs.duke.edu/~ychen/Phoenix_TNSM.pdf Line 557 ⟶ 568: \|volume = 24\|issue = 11\| pages = 3162–3172 \| doi = 10.1109/JBHI.2020.2991763 \|pmid = 32365039\| bibcode=2020IJBHI..24.3162C \|s2cid = 218504587\|hdl = 11311/1144602\|hdl-access = free}}</ref> and to infer pair of synergic anticancer drugs.<ref>{{Cite journal \| last1 = Pinoli\|last2 = Ceddia\|last3 = Ceri\|last4 = Masseroli \| title = Predicting drug synergism by means of non-negative matrix tri-factorization Line 578 ⟶ 589: \| doi=10.1109/42.996340 \| pmid = 11989846 \| bibcode = 2002ITMI...21..216S \| s2cid = 6553527 }}</ref> Line 609 ⟶ 621: \| doi=10.1109/TMI.2014.2352033 \| pmid = 25167546 \| bibcode = 2015ITMI...34..216A \| s2cid = 11060831 \| url = https://escholarship.org/uc/item/0b95c190 }}</ref> Line 667 ⟶ 680: \| chapter = Multi-View Clustering via Joint Nonnegative Matrix Factorization \| name-list-style = amp ~~\| journal = Proceedings of SIAM Data Mining Conference~~ \| date= 2013 \| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf Line 786 ⟶ 798: \| doi = 10.1155/2009/785152 \| pages = 1–17 \| article-number = 785152 \| pmid = 19536273 \| pmc = 2688815