Examine individual changes
This page allows you to examine the variables generated by the Edit Filter for an individual change.
Variables generated for this change
Variable | Value |
---|---|
Whether or not the edit is marked as minor (no longer in use) (minor_edit ) | false |
Edit count of the user (user_editcount ) | null |
Name of the user account (user_name ) | '109.75.36.95' |
Age of the user account (user_age ) | 0 |
Groups (including implicit) the user is in (user_groups ) | [
0 => '*'
] |
Global groups that the user is in (global_user_groups ) | [] |
Whether or not a user is editing through the mobile interface (user_mobile ) | false |
Page ID (page_id ) | 3681279 |
Page namespace (page_namespace ) | 0 |
Page title without namespace (page_title ) | 'Non-negative matrix factorization' |
Full page title (page_prefixedtitle ) | 'Non-negative matrix factorization' |
Last ten users to contribute to the page (page_recent_contributors ) | [
0 => 'JaffaMan',
1 => '87.164.82.245',
2 => '145.94.3.37',
3 => 'GünniX',
4 => 'Kku',
5 => 'Tom.Reding',
6 => 'Marcocapelle',
7 => 'Leegrc',
8 => '139.182.148.134',
9 => 'Maaaaaaywuy'
] |
First user to contribute to the page (page_first_contributor ) | 'Fnielsen' |
Action (action ) | 'edit' |
Edit summary/reason (summary ) | '/* Nuclear imaging */ ' |
Old content model (old_content_model ) | 'wikitext' |
New content model (new_content_model ) | 'wikitext' |
Old page wikitext, before the edit (old_wikitext ) | '{{Redirect|NMF|the convention in contract bridge|new minor forcing}}
[[File:NMF.png|thumb|400px|Illustration of approximate non-negative matrix factorization: the matrix {{math|'''V'''}} is represented by the two smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, which, when multiplied, approximately reconstruct {{math|'''V'''}}.]]
'''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite journal|last1=Tandon|first1=Rashish|author2=Suvrit Sra|title=Sparse nonnegative matrix approximation: new formulations and algorithms|year=2010|series=TR|url=ftp://ftp.kyb.tuebingen.mpg.de/pub/mpi-memos/pdf/nmftr.pdf}}</ref> is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)|matrix]] {{math|'''V'''}} is [[Matrix decomposition|factorized]] into (usually) two matrices {{math|'''W'''}} and {{math|'''H'''}}, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
NMF finds applications in such fields as [[computer vision]], document [[Cluster analysis|clustering]],<ref name="dhillon"/> [[chemometrics]], [[audio signal processing]]<ref name="wangchapter">{{cite book |last=Wang |first=Wenwu |editor-last=Wang |editor-first=Wenwu |title=Machine Audition: Principles, Algorithms and Systems |publisher=IGI Global |date=2010 |pages=353–370 |chapter=Instantaneous Versus Convolutive Non-Negative Matrix Factorization: Models, Algorithms and Applications to Audio Pattern Separation |doi=10.4018/978-1-61520-919-4.ch015}}</ref> and [[recommender system]]s.<ref name="gemulla">{{cite conference |author=Rainer Gemulla |author2=Erik Nijkamp |author3=Peter J Haas |author4=Yannis Sismanis |title=Large-scale matrix factorization with distributed stochastic gradient descent |conference=Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining |url=http://www.mpi-inf.mpg.de/~rgemulla/publications/rj10481rev.pdf |year=2011 |pages=69–77}}</ref><ref>{{cite conference |author=Yang Bao|title=TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation |conference=AAAI |url=http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8273 |year=2014 |pages=|display-authors=etal}}</ref>
== History ==
In [[chemometrics]] non-negative matrix factorization has a long history under the name "self modeling curve resolution".<ref>{{Cite journal
| author1 = William H. Lawton
| author-link1 = William H. Lawton
| author2 = Edward A. Sylvestre
| author-link2 = Edward A. Sylvestre
| title= Self modeling curve resolution
| journal = [[Technometrics]]
| volume = 13
| issue = 3
| year = 1971
| page = 617+
| doi=10.2307/1267173
| jstor = 1267173
}}</ref>
In this framework the vectors in the right matrix are continuous curves rather than discrete vectors.
Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal
|author1=P. Paatero |author2=U. Tapper | title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values
| journal = [[Environmetrics]]
| volume = 5
| pages = 111–126
| year = 1994
| doi = 10.1002/env.3170050203
| issue = 2
}}</ref><ref>{{Cite journal
| author = Pia Anttila
| author-link = Pia Anttila
| author2 = Pentti Paatero
| author2-link = Pentti Paatero
| author3 = Unto Tapper
| author4 = Olli Järvinen
| title = Source identification of bulk wet deposition in Finland by positive matrix factorization
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 29
| issue = 14
| pages = 1705–1718
| year = 1995
| doi = 10.1016/1352-2310(94)00367-T
| bibcode = 1995AtmEn..29.1705A
}}</ref>
It became more widely known as ''non-negative matrix factorization'' after Lee and Seung investigated
the properties of the algorithm and published some simple and useful
algorithms for two types of factorizations.<ref name="lee-seung">{{Cite journal
| author = Daniel D. Lee
| author2 = H. Sebastian Seung
| author2-link = Sebastian Seung
| last-author-amp = yes
| year = 1999
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature (journal)|Nature]]
| volume = 401
| issue = 6755
| pages = 788–791
| doi = 10.1038/44565
| pmid = 10548103
| bibcode = 1999Natur.401..788L
}}</ref><ref name="lee2001algorithms">{{Cite conference
|author1=Daniel D. Lee |author2=H. Sebastian Seung
|lastauthoramp=yes | year = 2001
| url = http://www.nips.cc/Web/Groups/NIPS/NIPS2000/00papers-pub-on-web/LeeSeung.ps.gz
| title = Algorithms for Non-negative Matrix Factorization
| conference = Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference
| pages = 556–562
| publisher = [[MIT Press]]
}}</ref>
== Background ==
Let matrix {{math|'''V'''}} be the product of the matrices {{math|'''W'''}} and {{math|'''H'''}},
:<math>\mathbf{V} = \mathbf{W} \mathbf{H} \,.</math>
Matrix multiplication can be implemented as computing the column vectors of {{math|'''V'''}} as linear combinations of the column vectors in {{math|'''W'''}} using coefficients supplied by columns of {{math|'''H'''}}. That is, each column of {{math|'''V'''}} can be computed as follows:
:<math>\mathbf{v}_i = \mathbf{W} \mathbf{h}_{i} \,,</math>
where {{math|'''v'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the product matrix {{math|'''V'''}} and {{math|'''h'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the matrix {{math|'''H'''}}.
When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if {{math|'''V'''}} is an {{math|''m'' × ''n''}} matrix, {{math|'''W'''}} is an {{math|''m'' × ''p''}} matrix, and {{math|'''H'''}} is a {{math|''p'' × ''n''}} matrix then {{math|''p''}} can be significantly less than both {{math|''m''}} and {{math|''n''}}.
Here's an example based on a text-mining application:
* Let the input matrix (the matrix to be factored) be {{math|'''V'''}} with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector {{math|'''v'''}} in {{math|'''V'''}} represents a document.
* Assume we ask the algorithm to find 10 features in order to generate a ''features matrix'' {{math|'''W'''}} with 10000 rows and 10 columns and a ''coefficients matrix'' {{math|'''H'''}} with 10 rows and 500 columns.
* The product of {{math|'''W'''}} and {{math|'''H'''}} is a matrix with 10000 rows and 500 columns, the same shape as the input matrix {{math|'''V'''}} and, if the factorization worked, it is a reasonable approximation to the input matrix {{math|'''V'''}}.
* From the treatment of matrix multiplication above it follows that each column in the product matrix {{math|'''WH'''}} is a linear combination of the 10 column vectors in the features matrix {{math|'''W'''}} with coefficients supplied by the coefficients matrix {{math|'''H'''}}.
This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features.
It's useful to think of each feature (column vector) in the features matrix {{math|'''W'''}} as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix {{math|'''H'''}} represents an original document with a cell value defining the document's rank for a feature. This follows because each row in {{math|'''H'''}} represents a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in {{math|'''W'''}}) where each feature is weighted by the feature's cell value from the document's column in {{math|'''H'''}}.
== Clustering property ==
NMF has an inherent clustering property,<ref name="DingSDM2005" /> i.e., it automatically clusters the columns of input data
<math>\mathbf{V} = (v_1, \cdots, v_n) </math>. It is this property that drives most applications of NMF.
More specifically, the approximation of <math>\mathbf{V}</math> by
<math>\mathbf{V} \simeq \mathbf{W}\mathbf{H}</math> is achieved by minimizing the error function
<math> \min_{W,H} || V - WH ||_F,</math> subject to <math>W \geq 0, H \geq 0.</math>
If we add additional orthogonality constraint on <math> H </math>,
i.e., <math> H H^T = I </math>, then the above minimization is mathematically equivalent to the minimization of [[K-means clustering]] ).
Furthermore, the computed <math> H </math> gives the [[cluster indicator]], i.e.,
if <math>\mathbf{H}_{kj} > 0 </math>, that fact indicates
input data <math> v_j </math>
belongs to <math>k^{th}</math> cluster.
And the computed <math>W</math> gives the cluster centroids, i.e.,
the <math>k^{th}</math> column
gives the cluster centroid of
<math>k^{th}</math> cluster. This centroids representation can be significantly enhanced by convex NMF.
When the orthogonality <math> H H^T = I </math> is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. Clustering is the main objective of most [[data mining]] applications of NMF.{{citation needed|date=April 2015}}
When the error function to be used is [[Kullback–Leibler divergence]], NMF is identical to the [[Probabilistic latent semantic analysis]], a popular document clustering method.<ref>C Ding, T Li, W Peng, [http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf " On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing"] Computational Statistics & Data Analysis 52, 3913-3927</ref>
== Types ==
=== Approximate non-negative matrix factorization ===
Usually the number of columns of {{math|'''W'''}} and the number of rows of {{math|'''H'''}} in NMF are selected so the product {{math|'''WH'''}} will become an approximation to {{math|'''V'''}}. The full decomposition of {{math|'''V'''}} then amounts to the two non-negative matrices {{math|'''W'''}} and {{math|'''H'''}} as well as a residual {{math|'''U'''}}, such that: {{math|1='''V''' = '''WH''' + '''U'''}}. The elements of the residual matrix can either be negative or positive.
When {{math|'''W'''}} and {{math|'''H'''}} are smaller than {{math|'''V'''}} they become easier to store and manipulate. Another reason for factorizing {{math|'''V'''}} into smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, is that if one is able to approximately represent the elements of {{math|'''V'''}} by significantly less data, then one has to infer some latent structure in the data.
=== Convex non-negative matrix factorization ===
In standard NMF, matrix factor {{math|'''W''' ∈ ℝ<sub>+</sub><sup>''m'' × ''k''</sup>}}, i.e., {{math|'''W'''}} can be anything in that space. Convex NMF<ref name="ding">C Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010</ref> restricts the columns of {{math|'''W'''}} to convex combinations of the input data vectors <math> (v_1, \cdots, v_n) </math>. This greatly improves the quality of data representation of {{math|'''W'''}}. Furthermore, the resulting matrix factor {{math|'''H'''}} becomes more sparse and orthogonal.
=== Nonnegative rank factorization ===
In case the [[Nonnegative rank (linear algebra)|nonnegative rank]] of {{math|'''V'''}} is equal to its actual rank, {{math|1='''V''' = '''WH'''}} is called a nonnegative rank factorization.<ref name=BermanPlemmons74>{{cite journal|last=Berman|first=A.|author2=R.J. Plemmons |title=Inverses of nonnegative matrices|journal=Linear and Multilinear Algebra|year=1974|volume=2|issue=2|pages=161–172|doi=10.1080/03081087408817055}}</ref><ref name=BermanPlemmons94>{{cite book|author1=A. Berman |author2=R.J. Plemmons |title=Nonnegative matrices in the Mathematical Sciences|year=1994|publisher=SIAM|___location=Philadelphia}}</ref><ref name=Thomas74>{{cite journal|last=Thomas|first=L.B.|title=Problem 73-14, Rank factorization of nonnegative matrices|journal=SIAM rev.|year=1974|volume=16|issue=3|pages=393–394|doi=10.1137/1016064}}</ref> The problem of finding the NRF of {{math|'''V'''}}, if it exists, is known to be NP-hard.<ref name=Vavasis09>{{cite journal|last=Vavasis|first=S.A.|title=On the complexity of nonnegative matrix factorization|journal=SIAM J. Optim.|year=2009|volume=20|issue=3|pages=1364–1377|doi=10.1137/070709967}}</ref>
=== Different cost functions and regularizations ===
There are different types of non-negative matrix factorizations.
The different types arise from using different [[Loss function|cost function]]s for measuring the divergence between {{math|'''V'''}} and {{math|'''WH'''}} and possibly by [[regularization (mathematics)|regularization]] of the {{math|'''W'''}} and/or {{math|'''H'''}} matrices.<ref name="dhillon">{{Cite conference | author = Inderjit S. Dhillon | author-link = Inderjit S. Dhillon | author2 = Suvrit Sra| author2-link = Suvrit Sra | url = http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf |format=PDF|title = Generalized Nonnegative Matrix Approximations with Bregman Divergences | conference = [[Conference on Neural Information Processing Systems|NIPS]] | year = 2005}}</ref>
Two simple divergence functions studied by Lee and Seung are the squared error (or [[Frobenius norm]]) and an extension of the Kullback–Leibler divergence to positive matrices (the original [[Kullback–Leibler divergence]] is defined on probability distributions).
Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules.
The factorization problem in the squared error version of NMF may be stated as:
Given a matrix <math>\mathbf{V}</math> find nonnegative matrices W and H that minimize the function
: <math>F(\mathbf{W},\mathbf{H}) = \|\mathbf{V} - \mathbf{WH}\|^2_F</math>
Another type of NMF for images is based on the [[total variation norm]].<ref>{{Cite journal | last1 = Zhang | first1 = T. | last2 = Fang | first2 = B. | last3 = Liu | first3 = W. | last4 = Tang | first4 = Y. Y. | last5 = He | first5 = G. | last6 = Wen | first6 = J. | doi = 10.1016/j.neucom.2008.01.022 | title = Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns | journal = [[Neurocomputing (journal)|Neurocomputing]]| volume = 71 | issue = 10–12 | pages = 1824–1831| year = 2008 | pmid = | pmc = }}</ref>
When [[Tikhnov regularization|L1 regularization]] (akin to [[Lasso (statistics)|Lasso]]) is added to NMF with the mean squared error cost function, the resulting problem may be called '''non-negative sparse coding''' due to the similarity to the [[sparse coding]] problem,<ref name="hoyer02">{{cite conference |last=Hoyer |first=Patrik O. |title=Non-negative sparse coding |conference=Proc. IEEE Workshop on Neural Networks for Signal Processing |year=2002 |url=http://arxiv.org/pdf/cs/0202009}}</ref><ref name="Leo Taslaman and Björn Nilsson 2012 e46331">{{Cite journal
|author1=Leo Taslaman |author2=Björn Nilsson
|lastauthoramp=yes | title = A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data
| journal = [[PLoS One]]
| volume = 7
| issue = 11
| year = 2012
| pages = e46331
| doi = 10.1371/journal.pone.0046331
| pmid = 23133590
| pmc=3487913
|bibcode=2012PLoSO...746331T
}}</ref>
although it may also still be referred to as NMF.<ref>{{Cite conference | last1 = Hsieh | first1 = C. J. | last2 = Dhillon | first2 = I. S. | doi = 10.1145/2020408.2020577 | title = Fast coordinate descent methods with variable selection for non-negative matrix factorization | conference = Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 | pages =
1064| year = 2011 | isbn = 9781450308137 | pmid = | pmc = | url = http://www.cs.utexas.edu/~cjhsieh/nmf_kdd11.pdf}}</ref>
===Online NMF===
Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.<ref>http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf</ref><ref>http://portal.acm.org/citation.cfm?id=1339264.1339709</ref><ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo|author4=Bo Yuan|last-author-amp=yes|date=July 2012|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation|url=|journal=IEEE Transactions on Neural Networks and Learning Systems |issue=7 |doi=10.1109/TNNLS.2012.2197827|pmid=24807135|volume=23|pages=1087–1099}}</ref>
== Algorithms ==
There are several ways in which the {{math|'''W'''}} and {{math|'''H'''}} may be found: Lee and Seung's [[Multiplicative Weight Update Method|multiplicative update rule]]<ref name="lee2001algorithms"/> has been a popular method due to the simplicity of implementation. Since then, a few other algorithmic approaches have been developed.
Some successful algorithms are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math|'''H'''}} is fixed and {{math|'''W'''}} found by a non-negative least squares solver, then {{math|'''W'''}} is fixed and {{math|'''H'''}} is found analogously. The procedures used to solve for {{math|'''W'''}} and {{math|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math|'''W'''}} and {{math|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal | last1 = Lin | first1 = Chih-Jen| title = Projected Gradient Methods for Nonnegative Matrix Factorization | doi = 10.1162/neco.2007.19.10.2756 | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 19 | issue = 10 | pages = 2756–2779 | year = 2007 | pmid = 17716011| pmc = | url = http://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf}}</ref><ref>{{Cite journal | last1 = Lin | first1 = Chih-Jen| doi = 10.1109/TNN.2007.895831 | title = On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization | journal = IEEE Transactions on Neural Networks| volume = 18 | issue = 6 | pages = 1589–1596 | year = 2007 | pmid = | pmc = }}</ref> the [[active set]] method,<ref name="gemulla"/><ref name="kim2008nonnegative">{{Cite journal
| author = Hyunsoo Kim
| author2 = Haesun Park
| author2-link = Haesun Park
| last-author-amp = yes
| title = Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method
| journal = [[SIAM Journal on Matrix Analysis and Applications]]
| volume = 30
| issue = 2
| year = 2008
| pages = 713–730
| url = http://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf
| doi=10.1137/07069239x
}}</ref> the optimal gradient method,<ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo, Bo Yuan|date=June 2012|title=NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization|url=|journal=IEEE Transactions on Signal Processing |issue=6 |doi=10.1109/TSP.2012.2190406|pmid=|volume=60|pages=2882–2898}}</ref> and the block principal pivoting method<ref name="kim2011fast">{{Cite journal
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Matrix Factorization: An Active-set-like Method and Comparisons
| journal = [[SIAM Journal on Scientific Computing]]
| volume = 33
| issue = 6
| year = 2011
| pages = 3261–3281
| url = http://www.cc.gatech.edu/~jingu/docs/2011_paper_sisc_nmf.pdf
| doi=10.1137/110821172
}}</ref> among several others.
The currently available algorithms are sub-optimal as they can only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be [[NP-complete]].<ref>{{Cite journal
| title = On the equivalence of nonnegative matrix factorization and spectral clustering
| author = Ding, C.
| author2 = He, X.
| author3 = Simon, H.D.
| last-author-amp = yes
| journal = Proc. SIAM Data Mining Conf
| volume = 4
| pages = 606–610
| year = 2005
| doi=10.1137/1.9781611972757.70
| isbn = 978-0-89871-593-4
}}</ref> However, as in many other data mining applications, a local minimum may still prove to be useful.
=== Exact NMF ===
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal|last=Campbell|first=S.L.|author2=G.D. Poole |title=Computing nonnegative rank factorizations.|journal=Linear Algebra Appl.|year=1981|volume=35|pages=175–182|doi=10.1016/0024-3795(81)90272-x}}</ref> Kalofolias and Gallopoulos (2012)<ref name=KalofoliasGallopoulos2012>{{cite journal|last=Kalofolias|first=V.|author2=Gallopoulos, E. |title=Computing symmetric nonnegative rank factorizations|journal=Linear Algebra Appl|year=2012|volume=436|issue=2|pages=421–435|url=http://www.sciencedirect.com/science/article/pii/S0024379511002199#|doi=10.1016/j.laa.2011.03.016}}</ref> solved the symmetric counterpart of this problem, where {{math|'''V'''}} is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in O(rm^2) time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies the separability condition.<ref name=Arora2013>{{Cite conference
| last1 = Arora | first1 = Sanjeev
| last2 = Ge | first2 = Rong
| last3 = Halpern | first3 = Yoni
| last4 = Mimno | first4 = David
| last5 = Moitra | first5 = Ankur
| last6 = Sontag | first6 = David
| last7 = Wu | first7 = Yichen
| last8 = Zhu | first8 = Michael
| title = A practical algorithm for topic modeling with provable guarantees
| url = http://jmlr.csail.mit.edu/proceedings/papers/v28/arora13.html
| arxiv = 1212.4777
| conference = Proceedings of the 30th International Conference on Machine Learning
| year =2013
}}</ref>
== Relation to other techniques ==
In ''Learning the parts of objects by non-negative matrix factorization'' Lee and Seung<ref>{{Cite journal
| author = Lee, Daniel D and Seung, H Sebastian
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature]]
| volume = 401
| issue =
| year = 1999
| doi = 10.1038/44565
| url = http://www.columbia.edu/~jwp2128/Teaching/E4903/papers/nmf_nature.pdf
| pages = 788--791
}}</ref> proposed NMF mainly for parts-based decomposition of images. It compares NMF to [[vector quantization]] and [[principal component analysis]], and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.
[[Image:Restricted Boltzmann machine.svg|thumb|NMF as a probabilistic graphical model: visible units ({{math|'''V'''}}) are connected to hidden units ({{math|'''H'''}}) through weights {{math|'''W'''}}, so that {{math|'''V'''}} is [[Generative model|generated]] from a probability distribution with mean <math>\sum_a W_{ia}h_a</math>.<ref name="lee-seung"/>{{rp|5}}]]
It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA".<ref>{{Cite conference
| author = Wray Buntine
| url = http://cosco.hiit.fi/Articles/ecml02.pdf
| format=PDF| title = Variational Extensions to EM and Multinomial PCA
| conference = Proc. European Conference on Machine Learning (ECML-02)
| series = LNAI
| volume = 2430
| pages = 23–34
| year = 2002
}}</ref>
When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>{{Cite conference
|author1=Eric Gaussier |author2=Cyril Goutte
|lastauthoramp=yes | year = 2005
| url = http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf
| format=PDF| title = Relation between PLSA and NMF and Implications
| conference = Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05)
| pages = 601–602
}}</ref>
trained by [[maximum likelihood]] estimation.
That method is commonly used for analyzing and clustering textual data and is also related to the [[latent class model]].
NMF with the least-squares objective is equivalent to a relaxed form of [[K-means clustering]]: the matrix factor {{math|'''W'''}} contains cluster centroids and {{math|'''H'''}} contains cluster membership indicators.<ref name="DingSDM2005">C. Ding, X. He, H.D. Simon (2005). [http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"]. Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005</ref><ref>Ron Zass and [[Amnon Shashua]] (2005). "[http://www.cs.huji.ac.il/~zass/papers/cp-iccv05.pdf A Unifying Approach to Hard and Probabilistic Clustering]". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.</ref> This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF".{{r|ding}}
NMF can be seen as a two-layer [[Bayesian network|directed graphical]] model with one layer of observed random variables and one layer of hidden random variables.<ref>{{cite conference |author=Max Welling|title=Exponential Family Harmoniums with an Application to Information Retrieval |conference=NIPS|url=http://papers.nips.cc/paper/2672-exponential-family-harmoniums-with-an-application-to-information-retrieval |year=2004|pages=|display-authors=etal}}</ref>
NMF extends beyond matrices to tensors of arbitrary order.<ref>{{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model
| journal = [[Journal of Computational and Graphical Statistics]]
| volume = 8
| issue = 4
| pages = 854–888
| year = 1999
| doi = 10.2307/1390831
| jstor = 1390831
}}</ref><ref>{{Cite journal
|author1=Max Welling |author2=Markus Weber
|lastauthoramp=yes | year = 2001
| title = Positive Tensor Factorization
| journal = [[Pattern Recognition Letters]]
| volume = 22
| issue = 12
| pages = 1255–1261
| doi = 10.1016/S0167-8655(01)00070-8
}}</ref><ref>{{Cite conference
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Tensor Factorization with an Active-set-like Method
| publisher = Springer
| pages = 311–326
| url = http://www.cc.gatech.edu/~hpark/papers/2011_paper_hpscbook_ntf.pdf
| year = 2012
| conference = High-Performance Scientific Computing: Algorithms and Applications }}
</ref> This extension may be viewed as a non-negative counterpart to, e.g., the [[PARAFAC]] model.
Other extensions of NMF include joint factorisation of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning.<ref>{{Cite conference
| author = Kenan Yilmaz
| author2 = A. Taylan Cemgil
| author3 = Umut Simsekli
| last-author-amp = yes
| title = Generalized Coupled Tensor Factorization
| url = http://books.nips.cc/papers/files/nips24/NIPS2011_1189.pdf
| conference = NIPS
| year =2011
}}
</ref>
NMF is an instance of nonnegative [[quadratic programming]] ([[NQP]]), just like the [[support vector machine]] (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.<ref>{{Cite conference
| author = Vamsi K. Potluru
| author2 = Sergey M. Plis
| author3 = Morten Morup
| author4 = Vince D. Calhoun
| author5 = Terran Lane
| last-author-amp = yes
| title = Efficient Multiplicative updates for Support Vector Machines
| year = 2009
| conference = Proceedings of the 2009 SIAM Conference on Data Mining (SDM)
| pages = 1218–1229
}}</ref>
== Uniqueness ==
The factorization is not unique: A matrix and its [[inverse matrix|inverse]] can be used to transform the two factorization matrices by, e.g.,<ref>{{Cite conference
| author = Wei Xu
| author2 = Xin Liu
| author3 = Yihong Gong
| last-author-amp = yes
| title = Document clustering based on non-negative matrix factorization
| publisher = [[Association for Computing Machinery]]
| ___location = New York
| year = 2003
| conference = Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval
| pages = 267–273
| url = http://portal.acm.org/citation.cfm?id=860485
}}</ref>
: <math>\mathbf{WH} = \mathbf{WBB}^{-1}\mathbf{H}</math>
If the two new matrices <math>\mathbf{\tilde{W} = WB}</math> and <math>\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}</math> are [[non-negative matrix|non-negative]] they form another parametrization of the factorization.
The non-negativity of <math>\mathbf{\tilde{W}}</math> and <math>\mathbf{\tilde{H}}</math> applies at least if {{math|'''B'''}} is a non-negative [[monomial matrix]].
In this simple case it will just correspond to a scaling and a [[permutation]].
More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>Julian Eggert, Edgar Körner, "[http://dx.doi.org/10.1109/IJCNN.2004.1381036 Sparse coding and NMF]", ''Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004, pp. 2529-2533, 2004.</ref>
== Applications ==
=== Text mining ===
NMF can be used for [[text mining]] applications.
In this process, a [[document-term matrix|''document-term'' matrix]] is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents.
This matrix is factored into a ''term-feature'' and a ''feature-document'' matrix.
The features are derived from the contents of the documents, and the feature-document matrix describes [[Data clustering|data clusters]] of related documents.
One specific application used hierarchical NMF on a small subset of scientific abstracts from [[PubMed]].<ref>{{Cite journal
| last1 = Nielsen
| first1 = Finn Årup
| last2 = Balslev
| first2 = Daniela
| last3 = Hansen
| first3 = Lars Kai
| title = Mining the posterior cingulate: segregation between memory and pain components
| journal = [[NeuroImage]]
| volume = 27
| issue = 3
| pages = 520–522
| year = 2005
| doi = 10.1016/j.neuroimage.2005.04.034
| pmid = 15946864
}}</ref>
Another research group clustered parts of the [[Enron]] email dataset<ref>{{Cite web
| last1 = Cohen
| first1 = William
| title = Enron Email Dataset
| url = http://www.cs.cmu.edu/~enron/
| date = 2005-04-04
| accessdate = 2008-08-26
}}</ref>
with 65,033 messages and 91,133 terms into 50 clusters.<ref>{{Cite journal
| last1 = Berry
| first1 = Michael W.
| last2 = Browne
| title = Email Surveillance Using Non-negative Matrix Factorization
| journal = [[Computational and Mathematical Organization Theory]]
| volume = 11
| issue = 3
| pages = 249–264
| year = 2005
| doi = 10.1007/s10588-005-5380-5
| first2 = Murray
}}</ref>
NMF has also been applied to citations data, with one example clustering [[English Wikipedia]] articles and [[scientific journal]]s based on the outbound scientific citations in English Wikipedia.<ref>{{Cite conference
| last1 = Nielsen
| first = Finn Årup
| title = Clustering of scientific citations in Wikipedia
| conference = [[Wikimania]]
| year = 2008
| url = http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=5666
}}</ref>
Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF. The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings.<ref name=Arora2013 />
=== Spectral data analysis ===
NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.<ref name="BerryM2006Algorithm">{{Cite journal
| author = Michael W. Berry| title = Algorithms and Applications for Approximate Nonnegative Matrix Factorization
| year = 2006
|display-authors=etal}}</ref>
=== Scalable Internet distance prediction ===
NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with <math>N</math> hosts, with the help of NMF, the distances of all the <math>N^2</math> end-to-end links can be predicted after conducting only <math>O(N)</math> measurements. This kind of method was firstly introduced in Internet
Distance Estimation Service (IDES).<ref name="IDES_Mao06">{{Cite journal
|author1=Yun Mao
|author2=Lawrence Saul
|author3=Jonathan M. Smith
|lastauthoramp=yes | title = IDES: An Internet Distance Estimation Service for Large Networks
| journal = [[IEEE Journal on Selected Areas in Communications]]
| volume = 24
| issue = 12
| pages = 2273–2284
| year = 2006
| doi = 10.1109/JSAC.2006.884026
}}</ref> Afterwards, as a fully decentralized approach, Phoenix network coordinate system<ref name="Phoenix_Chen11">{{Cite journal
| author = Yang Chen
| author2 = Xiao Wang
| author3 = Cong Shi
| last-author-amp = yes
| url = http://www.cs.duke.edu/~ychen/Phoenix_TNSM.pdf
| format=PDF
| title = Phoenix: A Weight-based Network Coordinate System Using Matrix Factorization
| journal = [[IEEE Transactions on Network and Service Management]]
| volume = 8
| issue = 4
| pages = 334–347
| year = 2011
| doi=10.1109/tnsm.2011.110911.100079
|display-authors=etal}}</ref>
is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.
=== Non-stationary speech denoising ===
Speech denoising has been a long lasting problem in [[audio signal processing]]. There are lots of algorithms for denoising if the noise is stationary. For example, the [[Wiener filter]] is suitable for additive [[Gaussian noise]]. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate. Schmidt et al.<ref>Schmidt, M.N., J. Larsen, and F.T. Hsiao. (2007). "Wind noise reduction using non-negative sparse coding", ''Machine Learning for Signal Processing, IEEE Workshop on'', 431–436</ref> use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely represented by a noise dictionary, but speech cannot.
The algorithm for NMF denoising goes as follows. Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.
=== Bioinformatics ===
NMF has been successfully applied in [[bioinformatics]] for clustering [[gene expression]] and [[DNA methylation]] data and finding the genes most representative of the clusters.<ref name="Leo Taslaman and Björn Nilsson 2012 e46331"/><ref>{{Cite journal
| author = Devarajan, K.
| title = Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology
| journal = [[PLoS Computational Biology]]
| volume = 4
| issue = 7
| year = 2008
| doi=10.1371/journal.pcbi.1000029
| pages=e1000029
}}</ref><ref name="kim2007sparse">{{Cite journal
|author1=Hyunsoo Kim |author2=Haesun Park
|lastauthoramp=yes | title = Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
| journal = [[Bioinformatics (journal)|Bioinformatics]]
| volume = 23
| issue = 12
| pages = 1495–1502
| year = 2007
| doi = 10.1093/bioinformatics/btm134
| url = http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/12/1495
| pmid = 17483501
}}</ref><ref>{{Cite journal
| author = Schwalbe, E.
| title = DNA methylation profiling of medulloblastoma allows robust sub-classification and improved outcome prediction using formalin-fixed biopsies
| journal = [[Acta Neuropathologica]]
| volume = 125
| issue = 3
| year = 2013
| pages = 359–371
| doi =10.1007/s00401-012-1077-2
| pmid = 23291781
| pmc=4313078
}}</ref> In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes.<ref>{{Cite journal|last=Alexandrov|first=Ludmil B.|last2=Nik-Zainal|first2=Serena|last3=Wedge|first3=David C.|last4=Campbell|first4=Peter J.|last5=Stratton|first5=Michael R.|date=2013-01-31|title=Deciphering signatures of mutational processes operative in human cancer|journal=Cell Reports|volume=3|issue=1|pages=246–259|doi=10.1016/j.celrep.2012.12.008|issn=2211-1247|pmc=3588146|pmid=23318258}}</ref>
=== Nuclear imaging ===
NMF, also referred in this field as factor analysis, has been used since the [[80s]]<ref>{{Cite journal|last=DiPaola|first=|last2=Bazin|last3=Aubry|last4=Aurengo|last5=Cavailloles|last6=Herry|last7=Kahn|date=|year=1982|title=Handling of dynamic sequences in nuclear medicine|url=|journal=[[IEEE Trans Nucl Sci]]|volume=NS-29|issue=4|pages=1310–21|bibcode=1982ITNS...29.1310D|doi=10.1109/tns.1982.4332188|pmid=|via=}}</ref> to analyze sequences of images in [[SPECT]] and [[Positron emission tomography|PET]] dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.<ref>{{Cite journal
| last1 = Sitek
| last2 = Gullberg
|last3 = Huesman
| title = Correction for ambiguous solutions in factor analysis using a penalized least squares objective
| journal = [[IEEE Trans Med Imaging]]
| volume = 21
| issue = 3
| year = 2002
| pages = 216–25
| doi=10.1109/42.996340
}}</ref>
== Current research ==
Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to,
# Algorithmic: searching for global minima of the factors and factor initialization.<ref>{{Cite journal
|author1=C. Boutsidis |author2=E. Gallopoulos
|lastauthoramp=yes | title = SVD based initialization: A head start for nonnegative matrix factorization
| journal = Pattern Recognition
| volume = 41
| issue = 4
| pages = 1350–1362
| year = 2008
| doi = 10.1016/j.patcog.2007.09.010
}}</ref>
# Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF)<ref>{{Cite journal
|author1=Chao Liu |author2=Hung-chih Yang |author3=Jinliang Fan |author4=Li-Wei He |author5=Yi-Min Wang |last-author-amp=yes | title = Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce
| journal = Proceedings of the 19th International World Wide Web Conference
| year = 2010
| url = http://research.microsoft.com/pubs/119077/DNMF.pdf
}}</ref> and Scalable Nonnegative Matrix Factorization (ScalableNMF)<ref>{{Cite journal
| author = Jiangtao Yin
| author2 = Lixin Gao
| author3 = Zhongfei (Mark) Zhang
| last-author-amp = yes
| title = Scalable Nonnegative Matrix Factorization with Block-wise Updates
| journal = Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
| year = 2014
| url = http://rio.ecs.umass.edu/mnilpub/papers/ecmlpkdd2014-yin.pdf
}}</ref>
# Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC<ref>{{Cite journal
|author1=Dong Wang |author2=Ravichander Vipperla |author3=Nick Evans |author4=Thomas Fang Zheng | title = Online Non-Negative Convolutive Pattern Learning for Speech Signals
| journal = IEEE Transactions on Signal Processing
| year = 2013
| url = http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/pdf/cnsc-tsp.pdf
| doi=10.1109/tsp.2012.2222381
| volume=61
| pages=44–56
}}</ref>
# Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. mutli-view clustering, see CoNMF<ref>{{Cite journal
| author = Xiangnan He
| author2 = Min-Yen Kan
| author3 = Peichu Xie
| author4 = Xiao Chen
| last-author-amp = yes
| title = Comment-based Multi-View Clustering of Web 2.0 Items
| journal = Proceedings of the 23rd International World Wide Web Conference
| year = 2014
| url = http://www.comp.nus.edu.sg/~xiangnan/files/www2014-he.pdf
}}</ref> and MultiNMF<ref>{{Cite journal
| author = Jialu Liu
| author2 = Chi Wang
| author3 = Jing Gao
| author4 = Jiawei Han
| last-author-amp = yes
| title = Multi-View Clustering via Joint Nonnegative Matrix Factorization
| journal = Proceedings of SIAM Data Mining Conference
| year = 2013
| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf
| doi=10.1137/1.9781611972832.28
| pages=252–260
| isbn = 978-1-61197-262-7
}}</ref>
# Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively.<ref>{{Cite arXiv|last=Chistikov|first=Dmitry|last2=Kiefer|first2=Stefan|last3=Marušić|first3=Ines|last4=Shirmohammadi|first4=Mahsa|last5=Worrell|first5=James|date=2016-05-22|title=Nonnegative Matrix Factorization Requires Irrationality |eprint=1605.06848|class=cs.CC}}</ref>
==See also==
*[[Multilinear algebra]]
*[[Multilinear subspace learning]]
*[[Tensor]]
*[[Tensor decomposition]]
*[[Tensor software]]
== Sources and external links ==
=== Notes ===
{{Reflist|2}}
=== Others ===
{{refbegin}}
* {{Cite journal
|author1=J. Shen |author2=G. W. Israël | title = A receptor model using a specific non-negative transformation technique for ambient aerosol
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 23
| issue = 10
| pages = 2289–2298
| year = 1989
| doi = 10.1016/0004-6981(89)90190-X
|bibcode=1989AtmEn..23.2289S }}
* {{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = Least squares formulation of robust non-negative factor analysis
| journal = [[Chemometrics and Intelligent Laboratory Systems]]
| volume = 37
| issue = 1
| pages = 23–35
| year = 1997
| doi = 10.1016/S0169-7439(96)00044-5
}}
* {{Cite journal
| author = Raul Kompass
| title = A Generalized Divergence Measure for Nonnegative Matrix Factorization
| journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 19
| issue = 3
| year = 2007
| pages = 780–791
| pmid = 17298233
| doi = 10.1162/neco.2007.19.3.780
}}
* {{Cite journal
| title=Nonnegative Matrix Factorization and its applications in pattern recognition
| author=Liu, W.X.
| author2=Zheng, N.N.
| author3=You, Q.B.
| last-author-amp=yes
| journal=[[Chinese Science Bulletin]]
| volume=51
| pages=7–18
| year=2006
| url = http://www.springerlink.com/index/7285V70531634264.pdf
| doi=10.1007/s11434-005-1109-6
| issue=17–18
}}
* {{Cite arXiv
| author = Ngoc-Diep Ho
| author2 = Paul Van Dooren
| author3 = Vincent Blondel
| last-author-amp = yes
| title = Descent Methods for Nonnegative Matrix Factorization
| year = 2008
| eprint = 0801.3199
| class = cs.NA
}}
* {{Cite journal
| author = Andrzej Cichocki
| author-link = Andrzej Cichocki
| author2 = Rafal Zdunek
| author3 = Shun-ichi Amari
| author3-link = Shun-ichi Amari
| last-author-amp = yes
| title = Nonnegative Matrix and Tensor Factorization
| journal = [[IEEE Signal Processing Magazine]]
| volume = 25
| issue = 1
| year = 2008
| pages = 142–145
| doi = 10.1109/MSP.2008.4408452
| bibcode = 2008ISPM...25R.142C
}}
* {{Cite journal
| title = Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
|author1=Cédric Févotte |author2=Nancy Bertin |author3=Jean-Louis Durrieu |last-author-amp=yes | journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 21
| issue = 3
| year = 2009
| pmid=18785855
| doi=10.1162/neco.2008.04-08-771
| pages=793–830
}}
* {{Cite journal
| author = Ali Taylan Cemgil
| title = Bayesian Inference for Nonnegative Matrix Factorisation Models
| journal = [[Computational Intelligence and Neuroscience]]
| volume = 2009
| issue = 2
| year = 2009
| doi = 10.1155/2009/785152
| url = http://www.hindawi.com/journals/cin/2009/785152.abs.html
| pages = 1–17
| pmid = 19536273
| pmc = 2688815
}}
{{refend}}
[[Category:Linear algebra]]
[[Category:Matrix theory]]
[[Category:Machine learning algorithms]]' |
New page wikitext, after the edit (new_wikitext ) | '{{Redirect|NMF|the convention in contract bridge|new minor forcing}}
[[File:NMF.png|thumb|400px|Illustration of approximate non-negative matrix factorization: the matrix {{math|'''V'''}} is represented by the two smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, which, when multiplied, approximately reconstruct {{math|'''V'''}}.]]
'''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite journal|last1=Tandon|first1=Rashish|author2=Suvrit Sra|title=Sparse nonnegative matrix approximation: new formulations and algorithms|year=2010|series=TR|url=ftp://ftp.kyb.tuebingen.mpg.de/pub/mpi-memos/pdf/nmftr.pdf}}</ref> is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)|matrix]] {{math|'''V'''}} is [[Matrix decomposition|factorized]] into (usually) two matrices {{math|'''W'''}} and {{math|'''H'''}}, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
NMF finds applications in such fields as [[computer vision]], document [[Cluster analysis|clustering]],<ref name="dhillon"/> [[chemometrics]], [[audio signal processing]]<ref name="wangchapter">{{cite book |last=Wang |first=Wenwu |editor-last=Wang |editor-first=Wenwu |title=Machine Audition: Principles, Algorithms and Systems |publisher=IGI Global |date=2010 |pages=353–370 |chapter=Instantaneous Versus Convolutive Non-Negative Matrix Factorization: Models, Algorithms and Applications to Audio Pattern Separation |doi=10.4018/978-1-61520-919-4.ch015}}</ref> and [[recommender system]]s.<ref name="gemulla">{{cite conference |author=Rainer Gemulla |author2=Erik Nijkamp |author3=Peter J Haas |author4=Yannis Sismanis |title=Large-scale matrix factorization with distributed stochastic gradient descent |conference=Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining |url=http://www.mpi-inf.mpg.de/~rgemulla/publications/rj10481rev.pdf |year=2011 |pages=69–77}}</ref><ref>{{cite conference |author=Yang Bao|title=TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation |conference=AAAI |url=http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8273 |year=2014 |pages=|display-authors=etal}}</ref>
== History ==
In [[chemometrics]] non-negative matrix factorization has a long history under the name "self modeling curve resolution".<ref>{{Cite journal
| author1 = William H. Lawton
| author-link1 = William H. Lawton
| author2 = Edward A. Sylvestre
| author-link2 = Edward A. Sylvestre
| title= Self modeling curve resolution
| journal = [[Technometrics]]
| volume = 13
| issue = 3
| year = 1971
| page = 617+
| doi=10.2307/1267173
| jstor = 1267173
}}</ref>
In this framework the vectors in the right matrix are continuous curves rather than discrete vectors.
Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal
|author1=P. Paatero |author2=U. Tapper | title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values
| journal = [[Environmetrics]]
| volume = 5
| pages = 111–126
| year = 1994
| doi = 10.1002/env.3170050203
| issue = 2
}}</ref><ref>{{Cite journal
| author = Pia Anttila
| author-link = Pia Anttila
| author2 = Pentti Paatero
| author2-link = Pentti Paatero
| author3 = Unto Tapper
| author4 = Olli Järvinen
| title = Source identification of bulk wet deposition in Finland by positive matrix factorization
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 29
| issue = 14
| pages = 1705–1718
| year = 1995
| doi = 10.1016/1352-2310(94)00367-T
| bibcode = 1995AtmEn..29.1705A
}}</ref>
It became more widely known as ''non-negative matrix factorization'' after Lee and Seung investigated
the properties of the algorithm and published some simple and useful
algorithms for two types of factorizations.<ref name="lee-seung">{{Cite journal
| author = Daniel D. Lee
| author2 = H. Sebastian Seung
| author2-link = Sebastian Seung
| last-author-amp = yes
| year = 1999
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature (journal)|Nature]]
| volume = 401
| issue = 6755
| pages = 788–791
| doi = 10.1038/44565
| pmid = 10548103
| bibcode = 1999Natur.401..788L
}}</ref><ref name="lee2001algorithms">{{Cite conference
|author1=Daniel D. Lee |author2=H. Sebastian Seung
|lastauthoramp=yes | year = 2001
| url = http://www.nips.cc/Web/Groups/NIPS/NIPS2000/00papers-pub-on-web/LeeSeung.ps.gz
| title = Algorithms for Non-negative Matrix Factorization
| conference = Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference
| pages = 556–562
| publisher = [[MIT Press]]
}}</ref>
== Background ==
Let matrix {{math|'''V'''}} be the product of the matrices {{math|'''W'''}} and {{math|'''H'''}},
:<math>\mathbf{V} = \mathbf{W} \mathbf{H} \,.</math>
Matrix multiplication can be implemented as computing the column vectors of {{math|'''V'''}} as linear combinations of the column vectors in {{math|'''W'''}} using coefficients supplied by columns of {{math|'''H'''}}. That is, each column of {{math|'''V'''}} can be computed as follows:
:<math>\mathbf{v}_i = \mathbf{W} \mathbf{h}_{i} \,,</math>
where {{math|'''v'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the product matrix {{math|'''V'''}} and {{math|'''h'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the matrix {{math|'''H'''}}.
When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if {{math|'''V'''}} is an {{math|''m'' × ''n''}} matrix, {{math|'''W'''}} is an {{math|''m'' × ''p''}} matrix, and {{math|'''H'''}} is a {{math|''p'' × ''n''}} matrix then {{math|''p''}} can be significantly less than both {{math|''m''}} and {{math|''n''}}.
Here's an example based on a text-mining application:
* Let the input matrix (the matrix to be factored) be {{math|'''V'''}} with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector {{math|'''v'''}} in {{math|'''V'''}} represents a document.
* Assume we ask the algorithm to find 10 features in order to generate a ''features matrix'' {{math|'''W'''}} with 10000 rows and 10 columns and a ''coefficients matrix'' {{math|'''H'''}} with 10 rows and 500 columns.
* The product of {{math|'''W'''}} and {{math|'''H'''}} is a matrix with 10000 rows and 500 columns, the same shape as the input matrix {{math|'''V'''}} and, if the factorization worked, it is a reasonable approximation to the input matrix {{math|'''V'''}}.
* From the treatment of matrix multiplication above it follows that each column in the product matrix {{math|'''WH'''}} is a linear combination of the 10 column vectors in the features matrix {{math|'''W'''}} with coefficients supplied by the coefficients matrix {{math|'''H'''}}.
This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features.
It's useful to think of each feature (column vector) in the features matrix {{math|'''W'''}} as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix {{math|'''H'''}} represents an original document with a cell value defining the document's rank for a feature. This follows because each row in {{math|'''H'''}} represents a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in {{math|'''W'''}}) where each feature is weighted by the feature's cell value from the document's column in {{math|'''H'''}}.
== Clustering property ==
NMF has an inherent clustering property,<ref name="DingSDM2005" /> i.e., it automatically clusters the columns of input data
<math>\mathbf{V} = (v_1, \cdots, v_n) </math>. It is this property that drives most applications of NMF.
More specifically, the approximation of <math>\mathbf{V}</math> by
<math>\mathbf{V} \simeq \mathbf{W}\mathbf{H}</math> is achieved by minimizing the error function
<math> \min_{W,H} || V - WH ||_F,</math> subject to <math>W \geq 0, H \geq 0.</math>
If we add additional orthogonality constraint on <math> H </math>,
i.e., <math> H H^T = I </math>, then the above minimization is mathematically equivalent to the minimization of [[K-means clustering]] ).
Furthermore, the computed <math> H </math> gives the [[cluster indicator]], i.e.,
if <math>\mathbf{H}_{kj} > 0 </math>, that fact indicates
input data <math> v_j </math>
belongs to <math>k^{th}</math> cluster.
And the computed <math>W</math> gives the cluster centroids, i.e.,
the <math>k^{th}</math> column
gives the cluster centroid of
<math>k^{th}</math> cluster. This centroids representation can be significantly enhanced by convex NMF.
When the orthogonality <math> H H^T = I </math> is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. Clustering is the main objective of most [[data mining]] applications of NMF.{{citation needed|date=April 2015}}
When the error function to be used is [[Kullback–Leibler divergence]], NMF is identical to the [[Probabilistic latent semantic analysis]], a popular document clustering method.<ref>C Ding, T Li, W Peng, [http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf " On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing"] Computational Statistics & Data Analysis 52, 3913-3927</ref>
== Types ==
=== Approximate non-negative matrix factorization ===
Usually the number of columns of {{math|'''W'''}} and the number of rows of {{math|'''H'''}} in NMF are selected so the product {{math|'''WH'''}} will become an approximation to {{math|'''V'''}}. The full decomposition of {{math|'''V'''}} then amounts to the two non-negative matrices {{math|'''W'''}} and {{math|'''H'''}} as well as a residual {{math|'''U'''}}, such that: {{math|1='''V''' = '''WH''' + '''U'''}}. The elements of the residual matrix can either be negative or positive.
When {{math|'''W'''}} and {{math|'''H'''}} are smaller than {{math|'''V'''}} they become easier to store and manipulate. Another reason for factorizing {{math|'''V'''}} into smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, is that if one is able to approximately represent the elements of {{math|'''V'''}} by significantly less data, then one has to infer some latent structure in the data.
=== Convex non-negative matrix factorization ===
In standard NMF, matrix factor {{math|'''W''' ∈ ℝ<sub>+</sub><sup>''m'' × ''k''</sup>}}, i.e., {{math|'''W'''}} can be anything in that space. Convex NMF<ref name="ding">C Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010</ref> restricts the columns of {{math|'''W'''}} to convex combinations of the input data vectors <math> (v_1, \cdots, v_n) </math>. This greatly improves the quality of data representation of {{math|'''W'''}}. Furthermore, the resulting matrix factor {{math|'''H'''}} becomes more sparse and orthogonal.
=== Nonnegative rank factorization ===
In case the [[Nonnegative rank (linear algebra)|nonnegative rank]] of {{math|'''V'''}} is equal to its actual rank, {{math|1='''V''' = '''WH'''}} is called a nonnegative rank factorization.<ref name=BermanPlemmons74>{{cite journal|last=Berman|first=A.|author2=R.J. Plemmons |title=Inverses of nonnegative matrices|journal=Linear and Multilinear Algebra|year=1974|volume=2|issue=2|pages=161–172|doi=10.1080/03081087408817055}}</ref><ref name=BermanPlemmons94>{{cite book|author1=A. Berman |author2=R.J. Plemmons |title=Nonnegative matrices in the Mathematical Sciences|year=1994|publisher=SIAM|___location=Philadelphia}}</ref><ref name=Thomas74>{{cite journal|last=Thomas|first=L.B.|title=Problem 73-14, Rank factorization of nonnegative matrices|journal=SIAM rev.|year=1974|volume=16|issue=3|pages=393–394|doi=10.1137/1016064}}</ref> The problem of finding the NRF of {{math|'''V'''}}, if it exists, is known to be NP-hard.<ref name=Vavasis09>{{cite journal|last=Vavasis|first=S.A.|title=On the complexity of nonnegative matrix factorization|journal=SIAM J. Optim.|year=2009|volume=20|issue=3|pages=1364–1377|doi=10.1137/070709967}}</ref>
=== Different cost functions and regularizations ===
There are different types of non-negative matrix factorizations.
The different types arise from using different [[Loss function|cost function]]s for measuring the divergence between {{math|'''V'''}} and {{math|'''WH'''}} and possibly by [[regularization (mathematics)|regularization]] of the {{math|'''W'''}} and/or {{math|'''H'''}} matrices.<ref name="dhillon">{{Cite conference | author = Inderjit S. Dhillon | author-link = Inderjit S. Dhillon | author2 = Suvrit Sra| author2-link = Suvrit Sra | url = http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf |format=PDF|title = Generalized Nonnegative Matrix Approximations with Bregman Divergences | conference = [[Conference on Neural Information Processing Systems|NIPS]] | year = 2005}}</ref>
Two simple divergence functions studied by Lee and Seung are the squared error (or [[Frobenius norm]]) and an extension of the Kullback–Leibler divergence to positive matrices (the original [[Kullback–Leibler divergence]] is defined on probability distributions).
Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules.
The factorization problem in the squared error version of NMF may be stated as:
Given a matrix <math>\mathbf{V}</math> find nonnegative matrices W and H that minimize the function
: <math>F(\mathbf{W},\mathbf{H}) = \|\mathbf{V} - \mathbf{WH}\|^2_F</math>
Another type of NMF for images is based on the [[total variation norm]].<ref>{{Cite journal | last1 = Zhang | first1 = T. | last2 = Fang | first2 = B. | last3 = Liu | first3 = W. | last4 = Tang | first4 = Y. Y. | last5 = He | first5 = G. | last6 = Wen | first6 = J. | doi = 10.1016/j.neucom.2008.01.022 | title = Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns | journal = [[Neurocomputing (journal)|Neurocomputing]]| volume = 71 | issue = 10–12 | pages = 1824–1831| year = 2008 | pmid = | pmc = }}</ref>
When [[Tikhnov regularization|L1 regularization]] (akin to [[Lasso (statistics)|Lasso]]) is added to NMF with the mean squared error cost function, the resulting problem may be called '''non-negative sparse coding''' due to the similarity to the [[sparse coding]] problem,<ref name="hoyer02">{{cite conference |last=Hoyer |first=Patrik O. |title=Non-negative sparse coding |conference=Proc. IEEE Workshop on Neural Networks for Signal Processing |year=2002 |url=http://arxiv.org/pdf/cs/0202009}}</ref><ref name="Leo Taslaman and Björn Nilsson 2012 e46331">{{Cite journal
|author1=Leo Taslaman |author2=Björn Nilsson
|lastauthoramp=yes | title = A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data
| journal = [[PLoS One]]
| volume = 7
| issue = 11
| year = 2012
| pages = e46331
| doi = 10.1371/journal.pone.0046331
| pmid = 23133590
| pmc=3487913
|bibcode=2012PLoSO...746331T
}}</ref>
although it may also still be referred to as NMF.<ref>{{Cite conference | last1 = Hsieh | first1 = C. J. | last2 = Dhillon | first2 = I. S. | doi = 10.1145/2020408.2020577 | title = Fast coordinate descent methods with variable selection for non-negative matrix factorization | conference = Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 | pages =
1064| year = 2011 | isbn = 9781450308137 | pmid = | pmc = | url = http://www.cs.utexas.edu/~cjhsieh/nmf_kdd11.pdf}}</ref>
===Online NMF===
Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.<ref>http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf</ref><ref>http://portal.acm.org/citation.cfm?id=1339264.1339709</ref><ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo|author4=Bo Yuan|last-author-amp=yes|date=July 2012|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation|url=|journal=IEEE Transactions on Neural Networks and Learning Systems |issue=7 |doi=10.1109/TNNLS.2012.2197827|pmid=24807135|volume=23|pages=1087–1099}}</ref>
== Algorithms ==
There are several ways in which the {{math|'''W'''}} and {{math|'''H'''}} may be found: Lee and Seung's [[Multiplicative Weight Update Method|multiplicative update rule]]<ref name="lee2001algorithms"/> has been a popular method due to the simplicity of implementation. Since then, a few other algorithmic approaches have been developed.
Some successful algorithms are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math|'''H'''}} is fixed and {{math|'''W'''}} found by a non-negative least squares solver, then {{math|'''W'''}} is fixed and {{math|'''H'''}} is found analogously. The procedures used to solve for {{math|'''W'''}} and {{math|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math|'''W'''}} and {{math|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal | last1 = Lin | first1 = Chih-Jen| title = Projected Gradient Methods for Nonnegative Matrix Factorization | doi = 10.1162/neco.2007.19.10.2756 | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 19 | issue = 10 | pages = 2756–2779 | year = 2007 | pmid = 17716011| pmc = | url = http://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf}}</ref><ref>{{Cite journal | last1 = Lin | first1 = Chih-Jen| doi = 10.1109/TNN.2007.895831 | title = On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization | journal = IEEE Transactions on Neural Networks| volume = 18 | issue = 6 | pages = 1589–1596 | year = 2007 | pmid = | pmc = }}</ref> the [[active set]] method,<ref name="gemulla"/><ref name="kim2008nonnegative">{{Cite journal
| author = Hyunsoo Kim
| author2 = Haesun Park
| author2-link = Haesun Park
| last-author-amp = yes
| title = Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method
| journal = [[SIAM Journal on Matrix Analysis and Applications]]
| volume = 30
| issue = 2
| year = 2008
| pages = 713–730
| url = http://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf
| doi=10.1137/07069239x
}}</ref> the optimal gradient method,<ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo, Bo Yuan|date=June 2012|title=NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization|url=|journal=IEEE Transactions on Signal Processing |issue=6 |doi=10.1109/TSP.2012.2190406|pmid=|volume=60|pages=2882–2898}}</ref> and the block principal pivoting method<ref name="kim2011fast">{{Cite journal
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Matrix Factorization: An Active-set-like Method and Comparisons
| journal = [[SIAM Journal on Scientific Computing]]
| volume = 33
| issue = 6
| year = 2011
| pages = 3261–3281
| url = http://www.cc.gatech.edu/~jingu/docs/2011_paper_sisc_nmf.pdf
| doi=10.1137/110821172
}}</ref> among several others.
The currently available algorithms are sub-optimal as they can only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be [[NP-complete]].<ref>{{Cite journal
| title = On the equivalence of nonnegative matrix factorization and spectral clustering
| author = Ding, C.
| author2 = He, X.
| author3 = Simon, H.D.
| last-author-amp = yes
| journal = Proc. SIAM Data Mining Conf
| volume = 4
| pages = 606–610
| year = 2005
| doi=10.1137/1.9781611972757.70
| isbn = 978-0-89871-593-4
}}</ref> However, as in many other data mining applications, a local minimum may still prove to be useful.
=== Exact NMF ===
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal|last=Campbell|first=S.L.|author2=G.D. Poole |title=Computing nonnegative rank factorizations.|journal=Linear Algebra Appl.|year=1981|volume=35|pages=175–182|doi=10.1016/0024-3795(81)90272-x}}</ref> Kalofolias and Gallopoulos (2012)<ref name=KalofoliasGallopoulos2012>{{cite journal|last=Kalofolias|first=V.|author2=Gallopoulos, E. |title=Computing symmetric nonnegative rank factorizations|journal=Linear Algebra Appl|year=2012|volume=436|issue=2|pages=421–435|url=http://www.sciencedirect.com/science/article/pii/S0024379511002199#|doi=10.1016/j.laa.2011.03.016}}</ref> solved the symmetric counterpart of this problem, where {{math|'''V'''}} is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in O(rm^2) time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies the separability condition.<ref name=Arora2013>{{Cite conference
| last1 = Arora | first1 = Sanjeev
| last2 = Ge | first2 = Rong
| last3 = Halpern | first3 = Yoni
| last4 = Mimno | first4 = David
| last5 = Moitra | first5 = Ankur
| last6 = Sontag | first6 = David
| last7 = Wu | first7 = Yichen
| last8 = Zhu | first8 = Michael
| title = A practical algorithm for topic modeling with provable guarantees
| url = http://jmlr.csail.mit.edu/proceedings/papers/v28/arora13.html
| arxiv = 1212.4777
| conference = Proceedings of the 30th International Conference on Machine Learning
| year =2013
}}</ref>
== Relation to other techniques ==
In ''Learning the parts of objects by non-negative matrix factorization'' Lee and Seung<ref>{{Cite journal
| author = Lee, Daniel D and Seung, H Sebastian
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature]]
| volume = 401
| issue =
| year = 1999
| doi = 10.1038/44565
| url = http://www.columbia.edu/~jwp2128/Teaching/E4903/papers/nmf_nature.pdf
| pages = 788--791
}}</ref> proposed NMF mainly for parts-based decomposition of images. It compares NMF to [[vector quantization]] and [[principal component analysis]], and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.
[[Image:Restricted Boltzmann machine.svg|thumb|NMF as a probabilistic graphical model: visible units ({{math|'''V'''}}) are connected to hidden units ({{math|'''H'''}}) through weights {{math|'''W'''}}, so that {{math|'''V'''}} is [[Generative model|generated]] from a probability distribution with mean <math>\sum_a W_{ia}h_a</math>.<ref name="lee-seung"/>{{rp|5}}]]
It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA".<ref>{{Cite conference
| author = Wray Buntine
| url = http://cosco.hiit.fi/Articles/ecml02.pdf
| format=PDF| title = Variational Extensions to EM and Multinomial PCA
| conference = Proc. European Conference on Machine Learning (ECML-02)
| series = LNAI
| volume = 2430
| pages = 23–34
| year = 2002
}}</ref>
When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>{{Cite conference
|author1=Eric Gaussier |author2=Cyril Goutte
|lastauthoramp=yes | year = 2005
| url = http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf
| format=PDF| title = Relation between PLSA and NMF and Implications
| conference = Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05)
| pages = 601–602
}}</ref>
trained by [[maximum likelihood]] estimation.
That method is commonly used for analyzing and clustering textual data and is also related to the [[latent class model]].
NMF with the least-squares objective is equivalent to a relaxed form of [[K-means clustering]]: the matrix factor {{math|'''W'''}} contains cluster centroids and {{math|'''H'''}} contains cluster membership indicators.<ref name="DingSDM2005">C. Ding, X. He, H.D. Simon (2005). [http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"]. Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005</ref><ref>Ron Zass and [[Amnon Shashua]] (2005). "[http://www.cs.huji.ac.il/~zass/papers/cp-iccv05.pdf A Unifying Approach to Hard and Probabilistic Clustering]". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.</ref> This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF".{{r|ding}}
NMF can be seen as a two-layer [[Bayesian network|directed graphical]] model with one layer of observed random variables and one layer of hidden random variables.<ref>{{cite conference |author=Max Welling|title=Exponential Family Harmoniums with an Application to Information Retrieval |conference=NIPS|url=http://papers.nips.cc/paper/2672-exponential-family-harmoniums-with-an-application-to-information-retrieval |year=2004|pages=|display-authors=etal}}</ref>
NMF extends beyond matrices to tensors of arbitrary order.<ref>{{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model
| journal = [[Journal of Computational and Graphical Statistics]]
| volume = 8
| issue = 4
| pages = 854–888
| year = 1999
| doi = 10.2307/1390831
| jstor = 1390831
}}</ref><ref>{{Cite journal
|author1=Max Welling |author2=Markus Weber
|lastauthoramp=yes | year = 2001
| title = Positive Tensor Factorization
| journal = [[Pattern Recognition Letters]]
| volume = 22
| issue = 12
| pages = 1255–1261
| doi = 10.1016/S0167-8655(01)00070-8
}}</ref><ref>{{Cite conference
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Tensor Factorization with an Active-set-like Method
| publisher = Springer
| pages = 311–326
| url = http://www.cc.gatech.edu/~hpark/papers/2011_paper_hpscbook_ntf.pdf
| year = 2012
| conference = High-Performance Scientific Computing: Algorithms and Applications }}
</ref> This extension may be viewed as a non-negative counterpart to, e.g., the [[PARAFAC]] model.
Other extensions of NMF include joint factorisation of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning.<ref>{{Cite conference
| author = Kenan Yilmaz
| author2 = A. Taylan Cemgil
| author3 = Umut Simsekli
| last-author-amp = yes
| title = Generalized Coupled Tensor Factorization
| url = http://books.nips.cc/papers/files/nips24/NIPS2011_1189.pdf
| conference = NIPS
| year =2011
}}
</ref>
NMF is an instance of nonnegative [[quadratic programming]] ([[NQP]]), just like the [[support vector machine]] (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.<ref>{{Cite conference
| author = Vamsi K. Potluru
| author2 = Sergey M. Plis
| author3 = Morten Morup
| author4 = Vince D. Calhoun
| author5 = Terran Lane
| last-author-amp = yes
| title = Efficient Multiplicative updates for Support Vector Machines
| year = 2009
| conference = Proceedings of the 2009 SIAM Conference on Data Mining (SDM)
| pages = 1218–1229
}}</ref>
== Uniqueness ==
The factorization is not unique: A matrix and its [[inverse matrix|inverse]] can be used to transform the two factorization matrices by, e.g.,<ref>{{Cite conference
| author = Wei Xu
| author2 = Xin Liu
| author3 = Yihong Gong
| last-author-amp = yes
| title = Document clustering based on non-negative matrix factorization
| publisher = [[Association for Computing Machinery]]
| ___location = New York
| year = 2003
| conference = Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval
| pages = 267–273
| url = http://portal.acm.org/citation.cfm?id=860485
}}</ref>
: <math>\mathbf{WH} = \mathbf{WBB}^{-1}\mathbf{H}</math>
If the two new matrices <math>\mathbf{\tilde{W} = WB}</math> and <math>\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}</math> are [[non-negative matrix|non-negative]] they form another parametrization of the factorization.
The non-negativity of <math>\mathbf{\tilde{W}}</math> and <math>\mathbf{\tilde{H}}</math> applies at least if {{math|'''B'''}} is a non-negative [[monomial matrix]].
In this simple case it will just correspond to a scaling and a [[permutation]].
More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>Julian Eggert, Edgar Körner, "[http://dx.doi.org/10.1109/IJCNN.2004.1381036 Sparse coding and NMF]", ''Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004, pp. 2529-2533, 2004.</ref>
== Applications ==
=== Text mining ===
NMF can be used for [[text mining]] applications.
In this process, a [[document-term matrix|''document-term'' matrix]] is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents.
This matrix is factored into a ''term-feature'' and a ''feature-document'' matrix.
The features are derived from the contents of the documents, and the feature-document matrix describes [[Data clustering|data clusters]] of related documents.
One specific application used hierarchical NMF on a small subset of scientific abstracts from [[PubMed]].<ref>{{Cite journal
| last1 = Nielsen
| first1 = Finn Årup
| last2 = Balslev
| first2 = Daniela
| last3 = Hansen
| first3 = Lars Kai
| title = Mining the posterior cingulate: segregation between memory and pain components
| journal = [[NeuroImage]]
| volume = 27
| issue = 3
| pages = 520–522
| year = 2005
| doi = 10.1016/j.neuroimage.2005.04.034
| pmid = 15946864
}}</ref>
Another research group clustered parts of the [[Enron]] email dataset<ref>{{Cite web
| last1 = Cohen
| first1 = William
| title = Enron Email Dataset
| url = http://www.cs.cmu.edu/~enron/
| date = 2005-04-04
| accessdate = 2008-08-26
}}</ref>
with 65,033 messages and 91,133 terms into 50 clusters.<ref>{{Cite journal
| last1 = Berry
| first1 = Michael W.
| last2 = Browne
| title = Email Surveillance Using Non-negative Matrix Factorization
| journal = [[Computational and Mathematical Organization Theory]]
| volume = 11
| issue = 3
| pages = 249–264
| year = 2005
| doi = 10.1007/s10588-005-5380-5
| first2 = Murray
}}</ref>
NMF has also been applied to citations data, with one example clustering [[English Wikipedia]] articles and [[scientific journal]]s based on the outbound scientific citations in English Wikipedia.<ref>{{Cite conference
| last1 = Nielsen
| first = Finn Årup
| title = Clustering of scientific citations in Wikipedia
| conference = [[Wikimania]]
| year = 2008
| url = http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=5666
}}</ref>
Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF. The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings.<ref name=Arora2013 />
=== Spectral data analysis ===
NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.<ref name="BerryM2006Algorithm">{{Cite journal
| author = Michael W. Berry| title = Algorithms and Applications for Approximate Nonnegative Matrix Factorization
| year = 2006
|display-authors=etal}}</ref>
=== Scalable Internet distance prediction ===
NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with <math>N</math> hosts, with the help of NMF, the distances of all the <math>N^2</math> end-to-end links can be predicted after conducting only <math>O(N)</math> measurements. This kind of method was firstly introduced in Internet
Distance Estimation Service (IDES).<ref name="IDES_Mao06">{{Cite journal
|author1=Yun Mao
|author2=Lawrence Saul
|author3=Jonathan M. Smith
|lastauthoramp=yes | title = IDES: An Internet Distance Estimation Service for Large Networks
| journal = [[IEEE Journal on Selected Areas in Communications]]
| volume = 24
| issue = 12
| pages = 2273–2284
| year = 2006
| doi = 10.1109/JSAC.2006.884026
}}</ref> Afterwards, as a fully decentralized approach, Phoenix network coordinate system<ref name="Phoenix_Chen11">{{Cite journal
| author = Yang Chen
| author2 = Xiao Wang
| author3 = Cong Shi
| last-author-amp = yes
| url = http://www.cs.duke.edu/~ychen/Phoenix_TNSM.pdf
| format=PDF
| title = Phoenix: A Weight-based Network Coordinate System Using Matrix Factorization
| journal = [[IEEE Transactions on Network and Service Management]]
| volume = 8
| issue = 4
| pages = 334–347
| year = 2011
| doi=10.1109/tnsm.2011.110911.100079
|display-authors=etal}}</ref>
is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.
=== Non-stationary speech denoising ===
Speech denoising has been a long lasting problem in [[audio signal processing]]. There are lots of algorithms for denoising if the noise is stationary. For example, the [[Wiener filter]] is suitable for additive [[Gaussian noise]]. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate. Schmidt et al.<ref>Schmidt, M.N., J. Larsen, and F.T. Hsiao. (2007). "Wind noise reduction using non-negative sparse coding", ''Machine Learning for Signal Processing, IEEE Workshop on'', 431–436</ref> use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely represented by a noise dictionary, but speech cannot.
The algorithm for NMF denoising goes as follows. Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.
=== Bioinformatics ===
NMF has been successfully applied in [[bioinformatics]] for clustering [[gene expression]] and [[DNA methylation]] data and finding the genes most representative of the clusters.<ref name="Leo Taslaman and Björn Nilsson 2012 e46331"/><ref>{{Cite journal
| author = Devarajan, K.
| title = Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology
| journal = [[PLoS Computational Biology]]
| volume = 4
| issue = 7
| year = 2008
| doi=10.1371/journal.pcbi.1000029
| pages=e1000029
}}</ref><ref name="kim2007sparse">{{Cite journal
|author1=Hyunsoo Kim |author2=Haesun Park
|lastauthoramp=yes | title = Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
| journal = [[Bioinformatics (journal)|Bioinformatics]]
| volume = 23
| issue = 12
| pages = 1495–1502
| year = 2007
| doi = 10.1093/bioinformatics/btm134
| url = http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/12/1495
| pmid = 17483501
}}</ref><ref>{{Cite journal
| author = Schwalbe, E.
| title = DNA methylation profiling of medulloblastoma allows robust sub-classification and improved outcome prediction using formalin-fixed biopsies
| journal = [[Acta Neuropathologica]]
| volume = 125
| issue = 3
| year = 2013
| pages = 359–371
| doi =10.1007/s00401-012-1077-2
| pmid = 23291781
| pmc=4313078
}}</ref> In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes.<ref>{{Cite journal|last=Alexandrov|first=Ludmil B.|last2=Nik-Zainal|first2=Serena|last3=Wedge|first3=David C.|last4=Campbell|first4=Peter J.|last5=Stratton|first5=Michael R.|date=2013-01-31|title=Deciphering signatures of mutational processes operative in human cancer|journal=Cell Reports|volume=3|issue=1|pages=246–259|doi=10.1016/j.celrep.2012.12.008|issn=2211-1247|pmc=3588146|pmid=23318258}}</ref>
== Current research ==
Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to,
# Algorithmic: searching for global minima of the factors and factor initialization.<ref>{{Cite journal
|author1=C. Boutsidis |author2=E. Gallopoulos
|lastauthoramp=yes | title = SVD based initialization: A head start for nonnegative matrix factorization
| journal = Pattern Recognition
| volume = 41
| issue = 4
| pages = 1350–1362
| year = 2008
| doi = 10.1016/j.patcog.2007.09.010
}}</ref>
# Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF)<ref>{{Cite journal
|author1=Chao Liu |author2=Hung-chih Yang |author3=Jinliang Fan |author4=Li-Wei He |author5=Yi-Min Wang |last-author-amp=yes | title = Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce
| journal = Proceedings of the 19th International World Wide Web Conference
| year = 2010
| url = http://research.microsoft.com/pubs/119077/DNMF.pdf
}}</ref> and Scalable Nonnegative Matrix Factorization (ScalableNMF)<ref>{{Cite journal
| author = Jiangtao Yin
| author2 = Lixin Gao
| author3 = Zhongfei (Mark) Zhang
| last-author-amp = yes
| title = Scalable Nonnegative Matrix Factorization with Block-wise Updates
| journal = Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
| year = 2014
| url = http://rio.ecs.umass.edu/mnilpub/papers/ecmlpkdd2014-yin.pdf
}}</ref>
# Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC<ref>{{Cite journal
|author1=Dong Wang |author2=Ravichander Vipperla |author3=Nick Evans |author4=Thomas Fang Zheng | title = Online Non-Negative Convolutive Pattern Learning for Speech Signals
| journal = IEEE Transactions on Signal Processing
| year = 2013
| url = http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/pdf/cnsc-tsp.pdf
| doi=10.1109/tsp.2012.2222381
| volume=61
| pages=44–56
}}</ref>
# Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. mutli-view clustering, see CoNMF<ref>{{Cite journal
| author = Xiangnan He
| author2 = Min-Yen Kan
| author3 = Peichu Xie
| author4 = Xiao Chen
| last-author-amp = yes
| title = Comment-based Multi-View Clustering of Web 2.0 Items
| journal = Proceedings of the 23rd International World Wide Web Conference
| year = 2014
| url = http://www.comp.nus.edu.sg/~xiangnan/files/www2014-he.pdf
}}</ref> and MultiNMF<ref>{{Cite journal
| author = Jialu Liu
| author2 = Chi Wang
| author3 = Jing Gao
| author4 = Jiawei Han
| last-author-amp = yes
| title = Multi-View Clustering via Joint Nonnegative Matrix Factorization
| journal = Proceedings of SIAM Data Mining Conference
| year = 2013
| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf
| doi=10.1137/1.9781611972832.28
| pages=252–260
| isbn = 978-1-61197-262-7
}}</ref>
# Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively.<ref>{{Cite arXiv|last=Chistikov|first=Dmitry|last2=Kiefer|first2=Stefan|last3=Marušić|first3=Ines|last4=Shirmohammadi|first4=Mahsa|last5=Worrell|first5=James|date=2016-05-22|title=Nonnegative Matrix Factorization Requires Irrationality |eprint=1605.06848|class=cs.CC}}</ref>
==See also==
*[[Multilinear algebra]]
*[[Multilinear subspace learning]]
*[[Tensor]]
*[[Tensor decomposition]]
*[[Tensor software]]
== Sources and external links ==
=== Notes ===
{{Reflist|2}}
=== Others ===
{{refbegin}}
* {{Cite journal
|author1=J. Shen |author2=G. W. Israël | title = A receptor model using a specific non-negative transformation technique for ambient aerosol
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 23
| issue = 10
| pages = 2289–2298
| year = 1989
| doi = 10.1016/0004-6981(89)90190-X
|bibcode=1989AtmEn..23.2289S }}
* {{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = Least squares formulation of robust non-negative factor analysis
| journal = [[Chemometrics and Intelligent Laboratory Systems]]
| volume = 37
| issue = 1
| pages = 23–35
| year = 1997
| doi = 10.1016/S0169-7439(96)00044-5
}}
* {{Cite journal
| author = Raul Kompass
| title = A Generalized Divergence Measure for Nonnegative Matrix Factorization
| journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 19
| issue = 3
| year = 2007
| pages = 780–791
| pmid = 17298233
| doi = 10.1162/neco.2007.19.3.780
}}
* {{Cite journal
| title=Nonnegative Matrix Factorization and its applications in pattern recognition
| author=Liu, W.X.
| author2=Zheng, N.N.
| author3=You, Q.B.
| last-author-amp=yes
| journal=[[Chinese Science Bulletin]]
| volume=51
| pages=7–18
| year=2006
| url = http://www.springerlink.com/index/7285V70531634264.pdf
| doi=10.1007/s11434-005-1109-6
| issue=17–18
}}
* {{Cite arXiv
| author = Ngoc-Diep Ho
| author2 = Paul Van Dooren
| author3 = Vincent Blondel
| last-author-amp = yes
| title = Descent Methods for Nonnegative Matrix Factorization
| year = 2008
| eprint = 0801.3199
| class = cs.NA
}}
* {{Cite journal
| author = Andrzej Cichocki
| author-link = Andrzej Cichocki
| author2 = Rafal Zdunek
| author3 = Shun-ichi Amari
| author3-link = Shun-ichi Amari
| last-author-amp = yes
| title = Nonnegative Matrix and Tensor Factorization
| journal = [[IEEE Signal Processing Magazine]]
| volume = 25
| issue = 1
| year = 2008
| pages = 142–145
| doi = 10.1109/MSP.2008.4408452
| bibcode = 2008ISPM...25R.142C
}}
* {{Cite journal
| title = Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
|author1=Cédric Févotte |author2=Nancy Bertin |author3=Jean-Louis Durrieu |last-author-amp=yes | journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 21
| issue = 3
| year = 2009
| pmid=18785855
| doi=10.1162/neco.2008.04-08-771
| pages=793–830
}}
* {{Cite journal
| author = Ali Taylan Cemgil
| title = Bayesian Inference for Nonnegative Matrix Factorisation Models
| journal = [[Computational Intelligence and Neuroscience]]
| volume = 2009
| issue = 2
| year = 2009
| doi = 10.1155/2009/785152
| url = http://www.hindawi.com/journals/cin/2009/785152.abs.html
| pages = 1–17
| pmid = 19536273
| pmc = 2688815
}}
{{refend}}
[[Category:Linear algebra]]
[[Category:Matrix theory]]
[[Category:Machine learning algorithms]]' |
Unified diff of changes made by edit (edit_diff ) | '@@ -474,18 +474,4 @@
| pmc=4313078
}}</ref> In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes.<ref>{{Cite journal|last=Alexandrov|first=Ludmil B.|last2=Nik-Zainal|first2=Serena|last3=Wedge|first3=David C.|last4=Campbell|first4=Peter J.|last5=Stratton|first5=Michael R.|date=2013-01-31|title=Deciphering signatures of mutational processes operative in human cancer|journal=Cell Reports|volume=3|issue=1|pages=246–259|doi=10.1016/j.celrep.2012.12.008|issn=2211-1247|pmc=3588146|pmid=23318258}}</ref>
-
-=== Nuclear imaging ===
-NMF, also referred in this field as factor analysis, has been used since the [[80s]]<ref>{{Cite journal|last=DiPaola|first=|last2=Bazin|last3=Aubry|last4=Aurengo|last5=Cavailloles|last6=Herry|last7=Kahn|date=|year=1982|title=Handling of dynamic sequences in nuclear medicine|url=|journal=[[IEEE Trans Nucl Sci]]|volume=NS-29|issue=4|pages=1310–21|bibcode=1982ITNS...29.1310D|doi=10.1109/tns.1982.4332188|pmid=|via=}}</ref> to analyze sequences of images in [[SPECT]] and [[Positron emission tomography|PET]] dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.<ref>{{Cite journal
- | last1 = Sitek
-| last2 = Gullberg
-|last3 = Huesman
- | title = Correction for ambiguous solutions in factor analysis using a penalized least squares objective
- | journal = [[IEEE Trans Med Imaging]]
-| volume = 21
-| issue = 3
- | year = 2002
-| pages = 216–25
- | doi=10.1109/42.996340
-}}</ref>
== Current research ==
' |
New page size (new_size ) | 45998 |
Old page size (old_size ) | 46937 |
Size change in edit (edit_delta ) | -939 |
Lines added in edit (added_lines ) | [] |
Lines removed in edit (removed_lines ) | [
0 => false,
1 => '=== Nuclear imaging ===',
2 => 'NMF, also referred in this field as factor analysis, has been used since the [[80s]]<ref>{{Cite journal|last=DiPaola|first=|last2=Bazin|last3=Aubry|last4=Aurengo|last5=Cavailloles|last6=Herry|last7=Kahn|date=|year=1982|title=Handling of dynamic sequences in nuclear medicine|url=|journal=[[IEEE Trans Nucl Sci]]|volume=NS-29|issue=4|pages=1310–21|bibcode=1982ITNS...29.1310D|doi=10.1109/tns.1982.4332188|pmid=|via=}}</ref> to analyze sequences of images in [[SPECT]] and [[Positron emission tomography|PET]] dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints.<ref>{{Cite journal',
3 => ' | last1 = Sitek',
4 => '| last2 = Gullberg ',
5 => '|last3 = Huesman',
6 => ' | title = Correction for ambiguous solutions in factor analysis using a penalized least squares objective ',
7 => ' | journal = [[IEEE Trans Med Imaging]]',
8 => '| volume = 21',
9 => '| issue = 3',
10 => ' | year = 2002',
11 => '| pages = 216–25',
12 => ' | doi=10.1109/42.996340',
13 => '}}</ref>'
] |
New page wikitext, pre-save transformed (new_pst ) | '{{Redirect|NMF|the convention in contract bridge|new minor forcing}}
[[File:NMF.png|thumb|400px|Illustration of approximate non-negative matrix factorization: the matrix {{math|'''V'''}} is represented by the two smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, which, when multiplied, approximately reconstruct {{math|'''V'''}}.]]
'''Non-negative matrix factorization''' ('''NMF''' or '''NNMF'''), also '''non-negative matrix approximation'''<ref name="dhillon"/><ref>{{cite journal|last1=Tandon|first1=Rashish|author2=Suvrit Sra|title=Sparse nonnegative matrix approximation: new formulations and algorithms|year=2010|series=TR|url=ftp://ftp.kyb.tuebingen.mpg.de/pub/mpi-memos/pdf/nmftr.pdf}}</ref> is a group of [[algorithm]]s in [[multivariate analysis]] and [[linear algebra]] where a [[matrix (mathematics)|matrix]] {{math|'''V'''}} is [[Matrix decomposition|factorized]] into (usually) two matrices {{math|'''W'''}} and {{math|'''H'''}}, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.
NMF finds applications in such fields as [[computer vision]], document [[Cluster analysis|clustering]],<ref name="dhillon"/> [[chemometrics]], [[audio signal processing]]<ref name="wangchapter">{{cite book |last=Wang |first=Wenwu |editor-last=Wang |editor-first=Wenwu |title=Machine Audition: Principles, Algorithms and Systems |publisher=IGI Global |date=2010 |pages=353–370 |chapter=Instantaneous Versus Convolutive Non-Negative Matrix Factorization: Models, Algorithms and Applications to Audio Pattern Separation |doi=10.4018/978-1-61520-919-4.ch015}}</ref> and [[recommender system]]s.<ref name="gemulla">{{cite conference |author=Rainer Gemulla |author2=Erik Nijkamp |author3=Peter J Haas |author4=Yannis Sismanis |title=Large-scale matrix factorization with distributed stochastic gradient descent |conference=Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining |url=http://www.mpi-inf.mpg.de/~rgemulla/publications/rj10481rev.pdf |year=2011 |pages=69–77}}</ref><ref>{{cite conference |author=Yang Bao|title=TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation |conference=AAAI |url=http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8273 |year=2014 |pages=|display-authors=etal}}</ref>
== History ==
In [[chemometrics]] non-negative matrix factorization has a long history under the name "self modeling curve resolution".<ref>{{Cite journal
| author1 = William H. Lawton
| author-link1 = William H. Lawton
| author2 = Edward A. Sylvestre
| author-link2 = Edward A. Sylvestre
| title= Self modeling curve resolution
| journal = [[Technometrics]]
| volume = 13
| issue = 3
| year = 1971
| page = 617+
| doi=10.2307/1267173
| jstor = 1267173
}}</ref>
In this framework the vectors in the right matrix are continuous curves rather than discrete vectors.
Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the middle of the 1990s under the name ''positive matrix factorization''.<ref>{{Cite journal
|author1=P. Paatero |author2=U. Tapper | title = Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values
| journal = [[Environmetrics]]
| volume = 5
| pages = 111–126
| year = 1994
| doi = 10.1002/env.3170050203
| issue = 2
}}</ref><ref>{{Cite journal
| author = Pia Anttila
| author-link = Pia Anttila
| author2 = Pentti Paatero
| author2-link = Pentti Paatero
| author3 = Unto Tapper
| author4 = Olli Järvinen
| title = Source identification of bulk wet deposition in Finland by positive matrix factorization
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 29
| issue = 14
| pages = 1705–1718
| year = 1995
| doi = 10.1016/1352-2310(94)00367-T
| bibcode = 1995AtmEn..29.1705A
}}</ref>
It became more widely known as ''non-negative matrix factorization'' after Lee and Seung investigated
the properties of the algorithm and published some simple and useful
algorithms for two types of factorizations.<ref name="lee-seung">{{Cite journal
| author = Daniel D. Lee
| author2 = H. Sebastian Seung
| author2-link = Sebastian Seung
| last-author-amp = yes
| year = 1999
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature (journal)|Nature]]
| volume = 401
| issue = 6755
| pages = 788–791
| doi = 10.1038/44565
| pmid = 10548103
| bibcode = 1999Natur.401..788L
}}</ref><ref name="lee2001algorithms">{{Cite conference
|author1=Daniel D. Lee |author2=H. Sebastian Seung
|lastauthoramp=yes | year = 2001
| url = http://www.nips.cc/Web/Groups/NIPS/NIPS2000/00papers-pub-on-web/LeeSeung.ps.gz
| title = Algorithms for Non-negative Matrix Factorization
| conference = Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference
| pages = 556–562
| publisher = [[MIT Press]]
}}</ref>
== Background ==
Let matrix {{math|'''V'''}} be the product of the matrices {{math|'''W'''}} and {{math|'''H'''}},
:<math>\mathbf{V} = \mathbf{W} \mathbf{H} \,.</math>
Matrix multiplication can be implemented as computing the column vectors of {{math|'''V'''}} as linear combinations of the column vectors in {{math|'''W'''}} using coefficients supplied by columns of {{math|'''H'''}}. That is, each column of {{math|'''V'''}} can be computed as follows:
:<math>\mathbf{v}_i = \mathbf{W} \mathbf{h}_{i} \,,</math>
where {{math|'''v'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the product matrix {{math|'''V'''}} and {{math|'''h'''<sub>''i''</sub>}} is the {{math|''i''}}-th column vector of the matrix {{math|'''H'''}}.
When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if {{math|'''V'''}} is an {{math|''m'' × ''n''}} matrix, {{math|'''W'''}} is an {{math|''m'' × ''p''}} matrix, and {{math|'''H'''}} is a {{math|''p'' × ''n''}} matrix then {{math|''p''}} can be significantly less than both {{math|''m''}} and {{math|''n''}}.
Here's an example based on a text-mining application:
* Let the input matrix (the matrix to be factored) be {{math|'''V'''}} with 10000 rows and 500 columns where words are in rows and documents are in columns. That is, we have 500 documents indexed by 10000 words. It follows that a column vector {{math|'''v'''}} in {{math|'''V'''}} represents a document.
* Assume we ask the algorithm to find 10 features in order to generate a ''features matrix'' {{math|'''W'''}} with 10000 rows and 10 columns and a ''coefficients matrix'' {{math|'''H'''}} with 10 rows and 500 columns.
* The product of {{math|'''W'''}} and {{math|'''H'''}} is a matrix with 10000 rows and 500 columns, the same shape as the input matrix {{math|'''V'''}} and, if the factorization worked, it is a reasonable approximation to the input matrix {{math|'''V'''}}.
* From the treatment of matrix multiplication above it follows that each column in the product matrix {{math|'''WH'''}} is a linear combination of the 10 column vectors in the features matrix {{math|'''W'''}} with coefficients supplied by the coefficients matrix {{math|'''H'''}}.
This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features.
It's useful to think of each feature (column vector) in the features matrix {{math|'''W'''}} as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix {{math|'''H'''}} represents an original document with a cell value defining the document's rank for a feature. This follows because each row in {{math|'''H'''}} represents a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in {{math|'''W'''}}) where each feature is weighted by the feature's cell value from the document's column in {{math|'''H'''}}.
== Clustering property ==
NMF has an inherent clustering property,<ref name="DingSDM2005" /> i.e., it automatically clusters the columns of input data
<math>\mathbf{V} = (v_1, \cdots, v_n) </math>. It is this property that drives most applications of NMF.
More specifically, the approximation of <math>\mathbf{V}</math> by
<math>\mathbf{V} \simeq \mathbf{W}\mathbf{H}</math> is achieved by minimizing the error function
<math> \min_{W,H} || V - WH ||_F,</math> subject to <math>W \geq 0, H \geq 0.</math>
If we add additional orthogonality constraint on <math> H </math>,
i.e., <math> H H^T = I </math>, then the above minimization is mathematically equivalent to the minimization of [[K-means clustering]] ).
Furthermore, the computed <math> H </math> gives the [[cluster indicator]], i.e.,
if <math>\mathbf{H}_{kj} > 0 </math>, that fact indicates
input data <math> v_j </math>
belongs to <math>k^{th}</math> cluster.
And the computed <math>W</math> gives the cluster centroids, i.e.,
the <math>k^{th}</math> column
gives the cluster centroid of
<math>k^{th}</math> cluster. This centroids representation can be significantly enhanced by convex NMF.
When the orthogonality <math> H H^T = I </math> is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too. Clustering is the main objective of most [[data mining]] applications of NMF.{{citation needed|date=April 2015}}
When the error function to be used is [[Kullback–Leibler divergence]], NMF is identical to the [[Probabilistic latent semantic analysis]], a popular document clustering method.<ref>C Ding, T Li, W Peng, [http://users.cis.fiu.edu/~taoli/pub/NMFpLSIequiv.pdf " On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing"] Computational Statistics & Data Analysis 52, 3913-3927</ref>
== Types ==
=== Approximate non-negative matrix factorization ===
Usually the number of columns of {{math|'''W'''}} and the number of rows of {{math|'''H'''}} in NMF are selected so the product {{math|'''WH'''}} will become an approximation to {{math|'''V'''}}. The full decomposition of {{math|'''V'''}} then amounts to the two non-negative matrices {{math|'''W'''}} and {{math|'''H'''}} as well as a residual {{math|'''U'''}}, such that: {{math|1='''V''' = '''WH''' + '''U'''}}. The elements of the residual matrix can either be negative or positive.
When {{math|'''W'''}} and {{math|'''H'''}} are smaller than {{math|'''V'''}} they become easier to store and manipulate. Another reason for factorizing {{math|'''V'''}} into smaller matrices {{math|'''W'''}} and {{math|'''H'''}}, is that if one is able to approximately represent the elements of {{math|'''V'''}} by significantly less data, then one has to infer some latent structure in the data.
=== Convex non-negative matrix factorization ===
In standard NMF, matrix factor {{math|'''W''' ∈ ℝ<sub>+</sub><sup>''m'' × ''k''</sup>}}, i.e., {{math|'''W'''}} can be anything in that space. Convex NMF<ref name="ding">C Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010</ref> restricts the columns of {{math|'''W'''}} to convex combinations of the input data vectors <math> (v_1, \cdots, v_n) </math>. This greatly improves the quality of data representation of {{math|'''W'''}}. Furthermore, the resulting matrix factor {{math|'''H'''}} becomes more sparse and orthogonal.
=== Nonnegative rank factorization ===
In case the [[Nonnegative rank (linear algebra)|nonnegative rank]] of {{math|'''V'''}} is equal to its actual rank, {{math|1='''V''' = '''WH'''}} is called a nonnegative rank factorization.<ref name=BermanPlemmons74>{{cite journal|last=Berman|first=A.|author2=R.J. Plemmons |title=Inverses of nonnegative matrices|journal=Linear and Multilinear Algebra|year=1974|volume=2|issue=2|pages=161–172|doi=10.1080/03081087408817055}}</ref><ref name=BermanPlemmons94>{{cite book|author1=A. Berman |author2=R.J. Plemmons |title=Nonnegative matrices in the Mathematical Sciences|year=1994|publisher=SIAM|___location=Philadelphia}}</ref><ref name=Thomas74>{{cite journal|last=Thomas|first=L.B.|title=Problem 73-14, Rank factorization of nonnegative matrices|journal=SIAM rev.|year=1974|volume=16|issue=3|pages=393–394|doi=10.1137/1016064}}</ref> The problem of finding the NRF of {{math|'''V'''}}, if it exists, is known to be NP-hard.<ref name=Vavasis09>{{cite journal|last=Vavasis|first=S.A.|title=On the complexity of nonnegative matrix factorization|journal=SIAM J. Optim.|year=2009|volume=20|issue=3|pages=1364–1377|doi=10.1137/070709967}}</ref>
=== Different cost functions and regularizations ===
There are different types of non-negative matrix factorizations.
The different types arise from using different [[Loss function|cost function]]s for measuring the divergence between {{math|'''V'''}} and {{math|'''WH'''}} and possibly by [[regularization (mathematics)|regularization]] of the {{math|'''W'''}} and/or {{math|'''H'''}} matrices.<ref name="dhillon">{{Cite conference | author = Inderjit S. Dhillon | author-link = Inderjit S. Dhillon | author2 = Suvrit Sra| author2-link = Suvrit Sra | url = http://books.nips.cc/papers/files/nips18/NIPS2005_0203.pdf |format=PDF|title = Generalized Nonnegative Matrix Approximations with Bregman Divergences | conference = [[Conference on Neural Information Processing Systems|NIPS]] | year = 2005}}</ref>
Two simple divergence functions studied by Lee and Seung are the squared error (or [[Frobenius norm]]) and an extension of the Kullback–Leibler divergence to positive matrices (the original [[Kullback–Leibler divergence]] is defined on probability distributions).
Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules.
The factorization problem in the squared error version of NMF may be stated as:
Given a matrix <math>\mathbf{V}</math> find nonnegative matrices W and H that minimize the function
: <math>F(\mathbf{W},\mathbf{H}) = \|\mathbf{V} - \mathbf{WH}\|^2_F</math>
Another type of NMF for images is based on the [[total variation norm]].<ref>{{Cite journal | last1 = Zhang | first1 = T. | last2 = Fang | first2 = B. | last3 = Liu | first3 = W. | last4 = Tang | first4 = Y. Y. | last5 = He | first5 = G. | last6 = Wen | first6 = J. | doi = 10.1016/j.neucom.2008.01.022 | title = Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns | journal = [[Neurocomputing (journal)|Neurocomputing]]| volume = 71 | issue = 10–12 | pages = 1824–1831| year = 2008 | pmid = | pmc = }}</ref>
When [[Tikhnov regularization|L1 regularization]] (akin to [[Lasso (statistics)|Lasso]]) is added to NMF with the mean squared error cost function, the resulting problem may be called '''non-negative sparse coding''' due to the similarity to the [[sparse coding]] problem,<ref name="hoyer02">{{cite conference |last=Hoyer |first=Patrik O. |title=Non-negative sparse coding |conference=Proc. IEEE Workshop on Neural Networks for Signal Processing |year=2002 |url=http://arxiv.org/pdf/cs/0202009}}</ref><ref name="Leo Taslaman and Björn Nilsson 2012 e46331">{{Cite journal
|author1=Leo Taslaman |author2=Björn Nilsson
|lastauthoramp=yes | title = A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data
| journal = [[PLoS One]]
| volume = 7
| issue = 11
| year = 2012
| pages = e46331
| doi = 10.1371/journal.pone.0046331
| pmid = 23133590
| pmc=3487913
|bibcode=2012PLoSO...746331T
}}</ref>
although it may also still be referred to as NMF.<ref>{{Cite conference | last1 = Hsieh | first1 = C. J. | last2 = Dhillon | first2 = I. S. | doi = 10.1145/2020408.2020577 | title = Fast coordinate descent methods with variable selection for non-negative matrix factorization | conference = Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 | pages =
1064| year = 2011 | isbn = 9781450308137 | pmid = | pmc = | url = http://www.cs.utexas.edu/~cjhsieh/nmf_kdd11.pdf}}</ref>
===Online NMF===
Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in [[Data stream|streaming]] fashion. One such use is for [[collaborative filtering]] in [[recommendation systems]], where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different.<ref>http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf</ref><ref>http://portal.acm.org/citation.cfm?id=1339264.1339709</ref><ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo|author4=Bo Yuan|last-author-amp=yes|date=July 2012|title=Online Nonnegative Matrix Factorization With Robust Stochastic Approximation|url=|journal=IEEE Transactions on Neural Networks and Learning Systems |issue=7 |doi=10.1109/TNNLS.2012.2197827|pmid=24807135|volume=23|pages=1087–1099}}</ref>
== Algorithms ==
There are several ways in which the {{math|'''W'''}} and {{math|'''H'''}} may be found: Lee and Seung's [[Multiplicative Weight Update Method|multiplicative update rule]]<ref name="lee2001algorithms"/> has been a popular method due to the simplicity of implementation. Since then, a few other algorithmic approaches have been developed.
Some successful algorithms are based on alternating [[non-negative least squares]]: in each step of such an algorithm, first {{math|'''H'''}} is fixed and {{math|'''W'''}} found by a non-negative least squares solver, then {{math|'''W'''}} is fixed and {{math|'''H'''}} is found analogously. The procedures used to solve for {{math|'''W'''}} and {{math|'''H'''}} may be the same<ref name="lin07"/> or different, as some NMF variants regularize one of {{math|'''W'''}} and {{math|'''H'''}}.<ref name="hoyer02"/> Specific approaches include the projected [[gradient descent]] methods,<ref name="lin07">{{Cite journal | last1 = Lin | first1 = Chih-Jen| title = Projected Gradient Methods for Nonnegative Matrix Factorization | doi = 10.1162/neco.2007.19.10.2756 | journal = [[Neural Computation (journal)|Neural Computation]]| volume = 19 | issue = 10 | pages = 2756–2779 | year = 2007 | pmid = 17716011| pmc = | url = http://www.csie.ntu.edu.tw/~cjlin/papers/pgradnmf.pdf}}</ref><ref>{{Cite journal | last1 = Lin | first1 = Chih-Jen| doi = 10.1109/TNN.2007.895831 | title = On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization | journal = IEEE Transactions on Neural Networks| volume = 18 | issue = 6 | pages = 1589–1596 | year = 2007 | pmid = | pmc = }}</ref> the [[active set]] method,<ref name="gemulla"/><ref name="kim2008nonnegative">{{Cite journal
| author = Hyunsoo Kim
| author2 = Haesun Park
| author2-link = Haesun Park
| last-author-amp = yes
| title = Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method
| journal = [[SIAM Journal on Matrix Analysis and Applications]]
| volume = 30
| issue = 2
| year = 2008
| pages = 713–730
| url = http://www.cc.gatech.edu/~hpark/papers/simax-nmf.pdf
| doi=10.1137/07069239x
}}</ref> the optimal gradient method,<ref>{{Cite journal|author=Naiyang Guan|author2=Dacheng Tao|author3=Zhigang Luo, Bo Yuan|date=June 2012|title=NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization|url=|journal=IEEE Transactions on Signal Processing |issue=6 |doi=10.1109/TSP.2012.2190406|pmid=|volume=60|pages=2882–2898}}</ref> and the block principal pivoting method<ref name="kim2011fast">{{Cite journal
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Matrix Factorization: An Active-set-like Method and Comparisons
| journal = [[SIAM Journal on Scientific Computing]]
| volume = 33
| issue = 6
| year = 2011
| pages = 3261–3281
| url = http://www.cc.gatech.edu/~jingu/docs/2011_paper_sisc_nmf.pdf
| doi=10.1137/110821172
}}</ref> among several others.
The currently available algorithms are sub-optimal as they can only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be [[NP-complete]].<ref>{{Cite journal
| title = On the equivalence of nonnegative matrix factorization and spectral clustering
| author = Ding, C.
| author2 = He, X.
| author3 = Simon, H.D.
| last-author-amp = yes
| journal = Proc. SIAM Data Mining Conf
| volume = 4
| pages = 606–610
| year = 2005
| doi=10.1137/1.9781611972757.70
| isbn = 978-0-89871-593-4
}}</ref> However, as in many other data mining applications, a local minimum may still prove to be useful.
=== Exact NMF ===
Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix {{math|'''V'''}}. A polynomial time algorithm for solving nonnegative rank factorization if {{math|'''V'''}} contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981.<ref name=CampbellPoole81>{{cite journal|last=Campbell|first=S.L.|author2=G.D. Poole |title=Computing nonnegative rank factorizations.|journal=Linear Algebra Appl.|year=1981|volume=35|pages=175–182|doi=10.1016/0024-3795(81)90272-x}}</ref> Kalofolias and Gallopoulos (2012)<ref name=KalofoliasGallopoulos2012>{{cite journal|last=Kalofolias|first=V.|author2=Gallopoulos, E. |title=Computing symmetric nonnegative rank factorizations|journal=Linear Algebra Appl|year=2012|volume=436|issue=2|pages=421–435|url=http://www.sciencedirect.com/science/article/pii/S0024379511002199#|doi=10.1016/j.laa.2011.03.016}}</ref> solved the symmetric counterpart of this problem, where {{math|'''V'''}} is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in O(rm^2) time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies the separability condition.<ref name=Arora2013>{{Cite conference
| last1 = Arora | first1 = Sanjeev
| last2 = Ge | first2 = Rong
| last3 = Halpern | first3 = Yoni
| last4 = Mimno | first4 = David
| last5 = Moitra | first5 = Ankur
| last6 = Sontag | first6 = David
| last7 = Wu | first7 = Yichen
| last8 = Zhu | first8 = Michael
| title = A practical algorithm for topic modeling with provable guarantees
| url = http://jmlr.csail.mit.edu/proceedings/papers/v28/arora13.html
| arxiv = 1212.4777
| conference = Proceedings of the 30th International Conference on Machine Learning
| year =2013
}}</ref>
== Relation to other techniques ==
In ''Learning the parts of objects by non-negative matrix factorization'' Lee and Seung<ref>{{Cite journal
| author = Lee, Daniel D and Seung, H Sebastian
| title = Learning the parts of objects by non-negative matrix factorization
| journal = [[Nature]]
| volume = 401
| issue =
| year = 1999
| doi = 10.1038/44565
| url = http://www.columbia.edu/~jwp2128/Teaching/E4903/papers/nmf_nature.pdf
| pages = 788--791
}}</ref> proposed NMF mainly for parts-based decomposition of images. It compares NMF to [[vector quantization]] and [[principal component analysis]], and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.
[[Image:Restricted Boltzmann machine.svg|thumb|NMF as a probabilistic graphical model: visible units ({{math|'''V'''}}) are connected to hidden units ({{math|'''H'''}}) through weights {{math|'''W'''}}, so that {{math|'''V'''}} is [[Generative model|generated]] from a probability distribution with mean <math>\sum_a W_{ia}h_a</math>.<ref name="lee-seung"/>{{rp|5}}]]
It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA".<ref>{{Cite conference
| author = Wray Buntine
| url = http://cosco.hiit.fi/Articles/ecml02.pdf
| format=PDF| title = Variational Extensions to EM and Multinomial PCA
| conference = Proc. European Conference on Machine Learning (ECML-02)
| series = LNAI
| volume = 2430
| pages = 23–34
| year = 2002
}}</ref>
When NMF is obtained by minimizing the [[Kullback–Leibler divergence]], it is in fact equivalent to another instance of multinomial PCA, [[probabilistic latent semantic analysis]],<ref>{{Cite conference
|author1=Eric Gaussier |author2=Cyril Goutte
|lastauthoramp=yes | year = 2005
| url = http://eprints.pascal-network.org/archive/00000971/01/39-gaussier.pdf
| format=PDF| title = Relation between PLSA and NMF and Implications
| conference = Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05)
| pages = 601–602
}}</ref>
trained by [[maximum likelihood]] estimation.
That method is commonly used for analyzing and clustering textual data and is also related to the [[latent class model]].
NMF with the least-squares objective is equivalent to a relaxed form of [[K-means clustering]]: the matrix factor {{math|'''W'''}} contains cluster centroids and {{math|'''H'''}} contains cluster membership indicators.<ref name="DingSDM2005">C. Ding, X. He, H.D. Simon (2005). [http://ranger.uta.edu/~chqding/papers/NMF-SDM2005.pdf "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering"]. Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005</ref><ref>Ron Zass and [[Amnon Shashua]] (2005). "[http://www.cs.huji.ac.il/~zass/papers/cp-iccv05.pdf A Unifying Approach to Hard and Probabilistic Clustering]". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.</ref> This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF".{{r|ding}}
NMF can be seen as a two-layer [[Bayesian network|directed graphical]] model with one layer of observed random variables and one layer of hidden random variables.<ref>{{cite conference |author=Max Welling|title=Exponential Family Harmoniums with an Application to Information Retrieval |conference=NIPS|url=http://papers.nips.cc/paper/2672-exponential-family-harmoniums-with-an-application-to-information-retrieval |year=2004|pages=|display-authors=etal}}</ref>
NMF extends beyond matrices to tensors of arbitrary order.<ref>{{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model
| journal = [[Journal of Computational and Graphical Statistics]]
| volume = 8
| issue = 4
| pages = 854–888
| year = 1999
| doi = 10.2307/1390831
| jstor = 1390831
}}</ref><ref>{{Cite journal
|author1=Max Welling |author2=Markus Weber
|lastauthoramp=yes | year = 2001
| title = Positive Tensor Factorization
| journal = [[Pattern Recognition Letters]]
| volume = 22
| issue = 12
| pages = 1255–1261
| doi = 10.1016/S0167-8655(01)00070-8
}}</ref><ref>{{Cite conference
|author1=Jingu Kim |author2=Haesun Park
|lastauthoramp=yes | title = Fast Nonnegative Tensor Factorization with an Active-set-like Method
| publisher = Springer
| pages = 311–326
| url = http://www.cc.gatech.edu/~hpark/papers/2011_paper_hpscbook_ntf.pdf
| year = 2012
| conference = High-Performance Scientific Computing: Algorithms and Applications }}
</ref> This extension may be viewed as a non-negative counterpart to, e.g., the [[PARAFAC]] model.
Other extensions of NMF include joint factorisation of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning.<ref>{{Cite conference
| author = Kenan Yilmaz
| author2 = A. Taylan Cemgil
| author3 = Umut Simsekli
| last-author-amp = yes
| title = Generalized Coupled Tensor Factorization
| url = http://books.nips.cc/papers/files/nips24/NIPS2011_1189.pdf
| conference = NIPS
| year =2011
}}
</ref>
NMF is an instance of nonnegative [[quadratic programming]] ([[NQP]]), just like the [[support vector machine]] (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains.<ref>{{Cite conference
| author = Vamsi K. Potluru
| author2 = Sergey M. Plis
| author3 = Morten Morup
| author4 = Vince D. Calhoun
| author5 = Terran Lane
| last-author-amp = yes
| title = Efficient Multiplicative updates for Support Vector Machines
| year = 2009
| conference = Proceedings of the 2009 SIAM Conference on Data Mining (SDM)
| pages = 1218–1229
}}</ref>
== Uniqueness ==
The factorization is not unique: A matrix and its [[inverse matrix|inverse]] can be used to transform the two factorization matrices by, e.g.,<ref>{{Cite conference
| author = Wei Xu
| author2 = Xin Liu
| author3 = Yihong Gong
| last-author-amp = yes
| title = Document clustering based on non-negative matrix factorization
| publisher = [[Association for Computing Machinery]]
| ___location = New York
| year = 2003
| conference = Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval
| pages = 267–273
| url = http://portal.acm.org/citation.cfm?id=860485
}}</ref>
: <math>\mathbf{WH} = \mathbf{WBB}^{-1}\mathbf{H}</math>
If the two new matrices <math>\mathbf{\tilde{W} = WB}</math> and <math>\mathbf{\tilde{H}}=\mathbf{B}^{-1}\mathbf{H}</math> are [[non-negative matrix|non-negative]] they form another parametrization of the factorization.
The non-negativity of <math>\mathbf{\tilde{W}}</math> and <math>\mathbf{\tilde{H}}</math> applies at least if {{math|'''B'''}} is a non-negative [[monomial matrix]].
In this simple case it will just correspond to a scaling and a [[permutation]].
More control over the non-uniqueness of NMF is obtained with sparsity constraints.<ref>Julian Eggert, Edgar Körner, "[http://dx.doi.org/10.1109/IJCNN.2004.1381036 Sparse coding and NMF]", ''Proceedings. 2004 IEEE International Joint Conference on Neural Networks, 2004, pp. 2529-2533, 2004.</ref>
== Applications ==
=== Text mining ===
NMF can be used for [[text mining]] applications.
In this process, a [[document-term matrix|''document-term'' matrix]] is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents.
This matrix is factored into a ''term-feature'' and a ''feature-document'' matrix.
The features are derived from the contents of the documents, and the feature-document matrix describes [[Data clustering|data clusters]] of related documents.
One specific application used hierarchical NMF on a small subset of scientific abstracts from [[PubMed]].<ref>{{Cite journal
| last1 = Nielsen
| first1 = Finn Årup
| last2 = Balslev
| first2 = Daniela
| last3 = Hansen
| first3 = Lars Kai
| title = Mining the posterior cingulate: segregation between memory and pain components
| journal = [[NeuroImage]]
| volume = 27
| issue = 3
| pages = 520–522
| year = 2005
| doi = 10.1016/j.neuroimage.2005.04.034
| pmid = 15946864
}}</ref>
Another research group clustered parts of the [[Enron]] email dataset<ref>{{Cite web
| last1 = Cohen
| first1 = William
| title = Enron Email Dataset
| url = http://www.cs.cmu.edu/~enron/
| date = 2005-04-04
| accessdate = 2008-08-26
}}</ref>
with 65,033 messages and 91,133 terms into 50 clusters.<ref>{{Cite journal
| last1 = Berry
| first1 = Michael W.
| last2 = Browne
| title = Email Surveillance Using Non-negative Matrix Factorization
| journal = [[Computational and Mathematical Organization Theory]]
| volume = 11
| issue = 3
| pages = 249–264
| year = 2005
| doi = 10.1007/s10588-005-5380-5
| first2 = Murray
}}</ref>
NMF has also been applied to citations data, with one example clustering [[English Wikipedia]] articles and [[scientific journal]]s based on the outbound scientific citations in English Wikipedia.<ref>{{Cite conference
| last1 = Nielsen
| first = Finn Årup
| title = Clustering of scientific citations in Wikipedia
| conference = [[Wikimania]]
| year = 2008
| url = http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=5666
}}</ref>
Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF. The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings.<ref name=Arora2013 />
=== Spectral data analysis ===
NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris.<ref name="BerryM2006Algorithm">{{Cite journal
| author = Michael W. Berry| title = Algorithms and Applications for Approximate Nonnegative Matrix Factorization
| year = 2006
|display-authors=etal}}</ref>
=== Scalable Internet distance prediction ===
NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with <math>N</math> hosts, with the help of NMF, the distances of all the <math>N^2</math> end-to-end links can be predicted after conducting only <math>O(N)</math> measurements. This kind of method was firstly introduced in Internet
Distance Estimation Service (IDES).<ref name="IDES_Mao06">{{Cite journal
|author1=Yun Mao
|author2=Lawrence Saul
|author3=Jonathan M. Smith
|lastauthoramp=yes | title = IDES: An Internet Distance Estimation Service for Large Networks
| journal = [[IEEE Journal on Selected Areas in Communications]]
| volume = 24
| issue = 12
| pages = 2273–2284
| year = 2006
| doi = 10.1109/JSAC.2006.884026
}}</ref> Afterwards, as a fully decentralized approach, Phoenix network coordinate system<ref name="Phoenix_Chen11">{{Cite journal
| author = Yang Chen
| author2 = Xiao Wang
| author3 = Cong Shi
| last-author-amp = yes
| url = http://www.cs.duke.edu/~ychen/Phoenix_TNSM.pdf
| format=PDF
| title = Phoenix: A Weight-based Network Coordinate System Using Matrix Factorization
| journal = [[IEEE Transactions on Network and Service Management]]
| volume = 8
| issue = 4
| pages = 334–347
| year = 2011
| doi=10.1109/tnsm.2011.110911.100079
|display-authors=etal}}</ref>
is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.
=== Non-stationary speech denoising ===
Speech denoising has been a long lasting problem in [[audio signal processing]]. There are lots of algorithms for denoising if the noise is stationary. For example, the [[Wiener filter]] is suitable for additive [[Gaussian noise]]. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate. Schmidt et al.<ref>Schmidt, M.N., J. Larsen, and F.T. Hsiao. (2007). "Wind noise reduction using non-negative sparse coding", ''Machine Learning for Signal Processing, IEEE Workshop on'', 431–436</ref> use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely represented by a noise dictionary, but speech cannot.
The algorithm for NMF denoising goes as follows. Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.
=== Bioinformatics ===
NMF has been successfully applied in [[bioinformatics]] for clustering [[gene expression]] and [[DNA methylation]] data and finding the genes most representative of the clusters.<ref name="Leo Taslaman and Björn Nilsson 2012 e46331"/><ref>{{Cite journal
| author = Devarajan, K.
| title = Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology
| journal = [[PLoS Computational Biology]]
| volume = 4
| issue = 7
| year = 2008
| doi=10.1371/journal.pcbi.1000029
| pages=e1000029
}}</ref><ref name="kim2007sparse">{{Cite journal
|author1=Hyunsoo Kim |author2=Haesun Park
|lastauthoramp=yes | title = Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
| journal = [[Bioinformatics (journal)|Bioinformatics]]
| volume = 23
| issue = 12
| pages = 1495–1502
| year = 2007
| doi = 10.1093/bioinformatics/btm134
| url = http://bioinformatics.oxfordjournals.org/cgi/content/abstract/23/12/1495
| pmid = 17483501
}}</ref><ref>{{Cite journal
| author = Schwalbe, E.
| title = DNA methylation profiling of medulloblastoma allows robust sub-classification and improved outcome prediction using formalin-fixed biopsies
| journal = [[Acta Neuropathologica]]
| volume = 125
| issue = 3
| year = 2013
| pages = 359–371
| doi =10.1007/s00401-012-1077-2
| pmid = 23291781
| pmc=4313078
}}</ref> In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes.<ref>{{Cite journal|last=Alexandrov|first=Ludmil B.|last2=Nik-Zainal|first2=Serena|last3=Wedge|first3=David C.|last4=Campbell|first4=Peter J.|last5=Stratton|first5=Michael R.|date=2013-01-31|title=Deciphering signatures of mutational processes operative in human cancer|journal=Cell Reports|volume=3|issue=1|pages=246–259|doi=10.1016/j.celrep.2012.12.008|issn=2211-1247|pmc=3588146|pmid=23318258}}</ref>
== Current research ==
Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to,
# Algorithmic: searching for global minima of the factors and factor initialization.<ref>{{Cite journal
|author1=C. Boutsidis |author2=E. Gallopoulos
|lastauthoramp=yes | title = SVD based initialization: A head start for nonnegative matrix factorization
| journal = Pattern Recognition
| volume = 41
| issue = 4
| pages = 1350–1362
| year = 2008
| doi = 10.1016/j.patcog.2007.09.010
}}</ref>
# Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF)<ref>{{Cite journal
|author1=Chao Liu |author2=Hung-chih Yang |author3=Jinliang Fan |author4=Li-Wei He |author5=Yi-Min Wang |last-author-amp=yes | title = Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce
| journal = Proceedings of the 19th International World Wide Web Conference
| year = 2010
| url = http://research.microsoft.com/pubs/119077/DNMF.pdf
}}</ref> and Scalable Nonnegative Matrix Factorization (ScalableNMF)<ref>{{Cite journal
| author = Jiangtao Yin
| author2 = Lixin Gao
| author3 = Zhongfei (Mark) Zhang
| last-author-amp = yes
| title = Scalable Nonnegative Matrix Factorization with Block-wise Updates
| journal = Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
| year = 2014
| url = http://rio.ecs.umass.edu/mnilpub/papers/ecmlpkdd2014-yin.pdf
}}</ref>
# Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC<ref>{{Cite journal
|author1=Dong Wang |author2=Ravichander Vipperla |author3=Nick Evans |author4=Thomas Fang Zheng | title = Online Non-Negative Convolutive Pattern Learning for Speech Signals
| journal = IEEE Transactions on Signal Processing
| year = 2013
| url = http://cslt.riit.tsinghua.edu.cn:8081/homepages/wangd/public/pdf/cnsc-tsp.pdf
| doi=10.1109/tsp.2012.2222381
| volume=61
| pages=44–56
}}</ref>
# Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. mutli-view clustering, see CoNMF<ref>{{Cite journal
| author = Xiangnan He
| author2 = Min-Yen Kan
| author3 = Peichu Xie
| author4 = Xiao Chen
| last-author-amp = yes
| title = Comment-based Multi-View Clustering of Web 2.0 Items
| journal = Proceedings of the 23rd International World Wide Web Conference
| year = 2014
| url = http://www.comp.nus.edu.sg/~xiangnan/files/www2014-he.pdf
}}</ref> and MultiNMF<ref>{{Cite journal
| author = Jialu Liu
| author2 = Chi Wang
| author3 = Jing Gao
| author4 = Jiawei Han
| last-author-amp = yes
| title = Multi-View Clustering via Joint Nonnegative Matrix Factorization
| journal = Proceedings of SIAM Data Mining Conference
| year = 2013
| url = http://jialu.cs.illinois.edu/paper/sdm2013-liu.pdf
| doi=10.1137/1.9781611972832.28
| pages=252–260
| isbn = 978-1-61197-262-7
}}</ref>
# Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively.<ref>{{Cite arXiv|last=Chistikov|first=Dmitry|last2=Kiefer|first2=Stefan|last3=Marušić|first3=Ines|last4=Shirmohammadi|first4=Mahsa|last5=Worrell|first5=James|date=2016-05-22|title=Nonnegative Matrix Factorization Requires Irrationality |eprint=1605.06848|class=cs.CC}}</ref>
==See also==
*[[Multilinear algebra]]
*[[Multilinear subspace learning]]
*[[Tensor]]
*[[Tensor decomposition]]
*[[Tensor software]]
== Sources and external links ==
=== Notes ===
{{Reflist|2}}
=== Others ===
{{refbegin}}
* {{Cite journal
|author1=J. Shen |author2=G. W. Israël | title = A receptor model using a specific non-negative transformation technique for ambient aerosol
| journal = [[Atmospheric Environment (journal)|Atmospheric Environment]]
| volume = 23
| issue = 10
| pages = 2289–2298
| year = 1989
| doi = 10.1016/0004-6981(89)90190-X
|bibcode=1989AtmEn..23.2289S }}
* {{Cite journal
| author = Pentti Paatero
| author-link = Pentti Paatero
| title = Least squares formulation of robust non-negative factor analysis
| journal = [[Chemometrics and Intelligent Laboratory Systems]]
| volume = 37
| issue = 1
| pages = 23–35
| year = 1997
| doi = 10.1016/S0169-7439(96)00044-5
}}
* {{Cite journal
| author = Raul Kompass
| title = A Generalized Divergence Measure for Nonnegative Matrix Factorization
| journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 19
| issue = 3
| year = 2007
| pages = 780–791
| pmid = 17298233
| doi = 10.1162/neco.2007.19.3.780
}}
* {{Cite journal
| title=Nonnegative Matrix Factorization and its applications in pattern recognition
| author=Liu, W.X.
| author2=Zheng, N.N.
| author3=You, Q.B.
| last-author-amp=yes
| journal=[[Chinese Science Bulletin]]
| volume=51
| pages=7–18
| year=2006
| url = http://www.springerlink.com/index/7285V70531634264.pdf
| doi=10.1007/s11434-005-1109-6
| issue=17–18
}}
* {{Cite arXiv
| author = Ngoc-Diep Ho
| author2 = Paul Van Dooren
| author3 = Vincent Blondel
| last-author-amp = yes
| title = Descent Methods for Nonnegative Matrix Factorization
| year = 2008
| eprint = 0801.3199
| class = cs.NA
}}
* {{Cite journal
| author = Andrzej Cichocki
| author-link = Andrzej Cichocki
| author2 = Rafal Zdunek
| author3 = Shun-ichi Amari
| author3-link = Shun-ichi Amari
| last-author-amp = yes
| title = Nonnegative Matrix and Tensor Factorization
| journal = [[IEEE Signal Processing Magazine]]
| volume = 25
| issue = 1
| year = 2008
| pages = 142–145
| doi = 10.1109/MSP.2008.4408452
| bibcode = 2008ISPM...25R.142C
}}
* {{Cite journal
| title = Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
|author1=Cédric Févotte |author2=Nancy Bertin |author3=Jean-Louis Durrieu |last-author-amp=yes | journal = [[Neural Computation (journal)|Neural Computation]]
| volume = 21
| issue = 3
| year = 2009
| pmid=18785855
| doi=10.1162/neco.2008.04-08-771
| pages=793–830
}}
* {{Cite journal
| author = Ali Taylan Cemgil
| title = Bayesian Inference for Nonnegative Matrix Factorisation Models
| journal = [[Computational Intelligence and Neuroscience]]
| volume = 2009
| issue = 2
| year = 2009
| doi = 10.1155/2009/785152
| url = http://www.hindawi.com/journals/cin/2009/785152.abs.html
| pages = 1–17
| pmid = 19536273
| pmc = 2688815
}}
{{refend}}
[[Category:Linear algebra]]
[[Category:Matrix theory]]
[[Category:Machine learning algorithms]]' |
Whether or not the change was made through a Tor exit node (tor_exit_node ) | 0 |
Unix timestamp of change (timestamp ) | 1494239624 |