Content deleted Content added
citation needed |
details and cite |
||
Line 3:
==Definitions==
In this section we consider a particular vector space model based on the [[Bag-of-words model|bag-of-words]] representation. Documents and queries are represented as vectors.
:<math>d_j = ( w_{1,j} ,w_{2,j} , \dotsc ,w_{n,j} )</math>
Line 12:
The definition of ''term'' depends on the application. Typically terms are single words, [[keyword (linguistics)|keyword]]s, or longer phrases. If words are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary (the number of distinct words occurring in the [[text corpus|corpus]]).
Vector operations can be used to compare documents with queries.<ref>{{Cite book |last=Büttcher |first=Stefan |title=Information retrieval: implementing and evaluating search engines |last2=Clarke |first2=Charles L. A. |last3=Cormack |first3=Gordon V. |date=2016 |publisher=The MIT Press |isbn=978-0-262-52887-0 |edition=First MIT Press paperback edition |___location=Cambridge, Massachusetts London, England}}</ref>
==Applications==
|