Document-term matrix: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
top: needs pruning
Line 1:
{{MI|
{{More citations needed|date=January 2021}}
{{Cleanup rewrite|it is very longwinded. The lead does not explain why this matrix would be needed|date=June 2025}}
}}
A '''document-term matrix''' is a mathematical [[Matrix (mathematics)|matrix]] that describes the frequency of terms that occur in each document in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a '''document-feature matrix''' where "features" may refer to other properties of a document besides terms.<ref>{{Cite web|title=Document-feature matrix :: Tutorials for quanteda|url=https://tutorials.quanteda.io/basic-operations/dfm/|access-date=2021-01-02|website=tutorials.quanteda.io}}</ref> It is also common to encounter the transpose, or '''term-document matrix''' where documents are the columns and terms are the rows. They are useful in the field of [[natural language processing]] and [[computational text analysis]].<ref>{{Cite web|title=15 Ways to Create a Document-Term Matrix in R|url=https://www.dustinstoltz.com/blog/2020/12/1/creating-document-term-matrix-comparison-in-r|access-date=2021-01-02|website=Dustin S. Stoltz|language=en-US}}</ref>