Content deleted Content added
StefanoTrv (talk | contribs) m Spaces |
m Bot: link syntax |
||
Line 18:
which shows which documents contain which terms and how many times they appear. Note that, unlike representing a document as just a token-count list, the document-term matrix includes all terms in the corpus (i.e. the corpus vocabulary), which is why there are zero-counts for terms in the corpus which do not also occur in a specific document.
As a result of the power-law distribution of tokens in nearly every corpus (see [[Zipf's law
==Choice of Terms==
|