I LIKE TO EAT SHRIMP AND READ BOOKS LOL :D LOLOL
{{Confusing|date=January 2010}}
The '''Generalized vector space model''' is a generalization of the [[vector space model]] used in [[information retrieval]]. '''Wong et al.'''<ref name="wong">{{cite | title=Generalized vector spaces model in information retrieval | url=http://doi.acm.org/10.1145/253495.253506 | first=S. K. M. | last=Wong | coauthors=Wojciech Ziarko, Patrick C. N. Wong | publisher=SIGIR ACM | date=1985}}</ref> presented an analysis of the problems that the pairwise orthogonality assumption of the [[vector space model]] (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM).
==Definitions==
GVSM introduces term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, they considered a new space, where each term vector ''t<sub>i</sub>'' was expressed as a linear combination of 'v LOL
For a document ''d<sub>k</sub>'' and a query ''q'' the similarity function now becomes:
:<math>sim(d_k,q) = \frac{\sum _{j=1}^n \sum _{i=1}^n w_{i,k}*w_{j,q}*t_i \cdot t_j }{\sqrt{\sum _{i=1}^n w_{i,k}^2}*\sqrt{\sum _{i=1}^n w_{i,q}^2}}</math>
where ''t<sub>i</sub>'' and ''t<sub>j</sub>'' are now vectors of a ''2<sup>n</sup>'' dimensional space.
Term correlation <math>t_i \cdot t_j</math> can be implemented in several ways. As an example Wong et al. use as input to their algorithm the term occurrence frequency matrix obtained from automatic indexing and the output is term correlation between any pair of index terms.
==Semantic information on GVSM==
There are at least two basic directions for embedding term to term relatedness, other than exact keyword matching, into a retrieval model:
# compute semantic correlations between terms
# compute frequency co-occurrence statistics from large corpora
Recently Tsatsaronis<ref>{{cite | title=A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness | url=http://www.aclweb.org/anthology/E/E09/E09-3009.pdf | last= Tsatsaronis | first=George | coauthors=Vicky Panagiotopoulou | date=2009}}</ref> focused on the first approach.
They measure semantic relatedness (''SR'') using a thesaurus (''O'') like [[WordNet]]. It considers the path length, captured by compactness (''SCM''), and the path depth, captured by semantic path elaboration (''SPE'').
They estimate the <math>t_i \cdot t_j</math> inner product by:
<math>t_i \cdot t_j = SR((t_i, t_j), (s_i, s_j), O)</math>
where ''s<sub>i</sub>'' and ''s<sub>j</sub>'' are senses of terms ''t<sub>i</sub>'' and ''t<sub>j</sub>'' respectively, maximizing <math>SCM \cdot SPE</math>.
== References ==
{{reflist}}
[[Category:Vector space model]]
|