Content deleted Content added
No edit summary |
|||
Line 18:
==Choice of Terms==
A point of view on the matrix is that each row represents a document. In the [[Vector space model|vectorial semantic model]] which is normally the one used when computing a document-term matrix, the goal is to represent the topic of a document by the frequency of semantically significant terms. The terms are semantic units of the documents. It is often assumed, for [[Indo-European languages]], that nouns, verbs and adjectives are the more significant [[syntactic category|categories]]
Adding [[collocation]] as terms improves the quality of the vectors, especially when computing similarities between documents.
|