Revision as of 22:41, 6 August 2023 edit Citation bot (talk \| contribs) Bots 5,870,002 edits Alter: title, template type. Add: chapter. Removed proxy/dead URL that duplicated identifier. Removed parameters. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox3 \| #UCB_webform_linked 579/2306 ← Previous edit		Revision as of 04:04, 9 December 2023 edit undo DMH223344 (talk \| contribs) Extended confirmed users 3,184 edits clarify Tag: Visual edit Next edit →
Line 1: {{More citations needed\|date=January 2021}} A '''document-term matrix''' is a mathematical [[Matrix (mathematics)\|matrix]] that describes the frequency of terms that occur in a ~~collection~~each ofdocument ~~documents~~in a collection. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. This matrix is a specific instance of a '''document-feature matrix''' where "features" may refer to other properties of a document besides terms.<ref>{{Cite web\|title=Document-feature matrix :: Tutorials for quanteda\|url=https://tutorials.quanteda.io/basic-operations/dfm/\|access-date=2021-01-02\|website=tutorials.quanteda.io}}</ref> It is also common to encounter the transpose, or '''term-document matrix''' where documents are the columns and terms are the rows. They are useful in the field of [[natural language processing]] and [[computational text analysis]].<ref>{{Cite web\|title=15 Ways to Create a Document-Term Matrix in R\|url=https://www.dustinstoltz.com/blog/2020/12/1/creating-document-term-matrix-comparison-in-r\|access-date=2021-01-02\|website=Dustin S. Stoltz\|language=en-US}}</ref> While the value of the cells is commonly the raw count of a given term, there are various schemes for weighting the raw counts such as, row normalizing (i.e. relative frequency/proportions) and [[tf-idf]].

Document-term matrix: Difference between revisions