Generalized vector space model: Difference between revisions

Content deleted Content added
Tag: gettingstarted edit
OAbot (talk | contribs)
m Open access bot: doi added to citation with #oabot.
 
(12 intermediate revisions by 12 users not shown)
Line 1:
{{short description|Generalization of the vector space model used in information retrieval}}
{{Confusing|date=January 2010}}
The '''Generalized vector space model''' is a generalization of the [[vector space model]] used in [[information retrieval]]. Wong '''Wong et al.'''<ref name="wong">{{citecitation | titlechapter=Generalized vector spaces model in information retrieval | urlfirst1=http://doiS.acm K.org/10 M.1145/253495.253506 | firsttitle=S.Proceedings K.of M.the 8th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '85 | pages=18–25 | lastlast1=Wong | coauthorsfirst2=Wojciech |last2=Ziarko, |first3=Patrick C. N. |last3=Wong | publisher=[[Association for Computing Machinery|SIGIR ACM]] | date=1985-06-05| doi=10.1145/253495.253506 | isbn=0897911598 | doi-access=free }}</ref> presented an analysis of the problems that the pairwise orthogonality assumption of the [[vector space model]] (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM).
 
==Definitions==
 
GVSM introduces term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, theythe arefactor considered a new space, where each term vector ''t<sub>i</sub>'' was expressed as a linear combination of ''2<sup>n</sup>'' vectors ''m<sub>r</sub>'' where ''r = 1...2<sup>n</sup>''.
 
For a document ''d<sub>k</sub>'' and a query ''q'' the similarity function now becomes:
Line 12 ⟶ 13:
where ''t<sub>i</sub>'' and ''t<sub>j</sub>'' are now vectors of a ''2<sup>n</sup>'' dimensional space.
 
Term correlation <math>t_i \cdot t_j</math> can be implemented in several ways. AsFor an example, Wong et al. use as input to their algorithmuses the term occurrence frequency matrix obtained from automatic indexing as input to their algorithm. The term occurrence and the output is the term correlation between any pair of index terms.
 
==Semantic information on GVSM==
Line 20 ⟶ 21:
# compute frequency co-occurrence statistics from large corpora
 
Recently Tsatsaronis<ref>{{citecitation | title=A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness | url=http://www.aclweb.org/anthology/E/E09/E09-3009.pdf | lastlast1= Tsatsaronis | firstfirst1=George | coauthorsfirst2=Vicky | last2=Panagiotopoulou | publisher=[[Association for Computing Machinery|EACL ACM]] |date=2009-04-02}}</ref> focused on the first approach.
 
They measure semantic relatedness (''SR'') using a thesaurus (''O'') like [[WordNet]]. It considers the path length, captured by compactness (''SCM''), and the path depth, captured by semantic path elaboration (''SPE'').
Line 28 ⟶ 29:
 
where ''s<sub>i</sub>'' and ''s<sub>j</sub>'' are senses of terms ''t<sub>i</sub>'' and ''t<sub>j</sub>'' respectively, maximizing <math>SCM \cdot SPE</math>.
 
Building also on the first approach, Waitelonis et al.<ref>{{citation | title=Linked Data enabled Generalized Vector Space Model to improve document retrieval | url=http://ceur-ws.org/Vol-1581/paper4.pdf | last1= Waitelonis | first1=Jörg | first2=Claudia | last2=Exeler | last3=Sack| first3=Harald | publisher=ISWC 2015, CEUR-WS 1581 |date=2015-09-11}}</ref> have computed semantic relatedness from [[Linked data|Linked Open Data]] resources including [[DBpedia]] as well as the [[YAGO (database)|YAGO taxonomy]].
Thereby they exploits taxonomic relationships among semantic entities in documents and queries after [[Entity linking|named entity linking]].
 
== References ==