Vector space model: Difference between revisions

Content deleted Content added
Further reading: Reference 1962 paper using term-document matrix & fix link
Line 45:
* <math>\mathrm{tf}_{t,d}</math> is term frequency of term ''t'' in document ''d'' (a local parameter)
* <math>\log{\frac{|D|}{|\{d' \in D \, | \, t \in d'\}|}}</math> is inverse document frequency (a global parameter). <math>|D|</math> is the total number of documents in the document set; <math>|\{d' \in D \, | \, t \in d'\}|</math> is the number of documents containing the term ''t''.
 
Using the cosine the similarity between document ''d<sub>j</sub>'' and query ''q'' can be calculated as:
 
:<math>\mathrm{cos}(d_j,q) = \frac{\mathbf{d_j} \cdot \mathbf{q}}{\left\| \mathbf{d_j} \right\| \left \| \mathbf{q} \right\|} = \frac{\sum _{i=1}^N w_{i,j}w_{i,q}}{\sqrt{\sum _{i=1}^N w_{i,j}^2}\sqrt{\sum _{i=1}^N w_{i,q}^2}}</math>
 
==Advantages==