Document-term matrix: Difference between revisions

Content deleted Content added
added reference
adding intro details
Line 1:
{{Unreferenced stub|auto=yes|date=December 2009}}
A '''document-term matrix''' or '''term-document matrix''' is a mathematical [[Matrix (mathematics)|matrix]] that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There areThis variousmatrix schemesis fora determiningspecific theinstance valueof thata each'''document-feature entrymatrix''' inwhere the"features" matrixmay shouldrefer take.to Oneother suchproperties of a document besides terms. schemeIt is [[tfalso common to encounter the transpose, or '''term-idf]]document matrix''' where documents are the columns and terms are the rows. They are useful in the field of [[natural language processing]] and [[computational text analysis]].<ref>{{Cite web|title=15 Ways to Create a Document-Term Matrix in R|url=https://www.dustinstoltz.com/blog/2020/12/1/creating-document-term-matrix-comparison-in-r|access-date=2021-01-02|website=Dustin S. Stoltz|language=en-US}}</ref> While the value of the cells is commonly the raw count of a given term, there are various schemes for weighting the raw counts such as relative frequency/proportions and[[tf-idf]].
 
==General Concept==