Generalized vector space model: Difference between revisions

Content deleted Content added
No edit summary
Tag: gettingstarted edit
OAbot (talk | contribs)
m Open access bot: doi added to citation with #oabot.
 
(9 intermediate revisions by 9 users not shown)
Line 1:
{{short description|Generalization of the vector space model used in information retrieval}}
{{Confusing|date=January 2010}}
The '''Generalized vector space model''' is a generalization of the [[vector space model]] used in [[information retrieval]]. Many classifiers, especially those which are related to document or text classification, use the TFIDF basis of VSM. However, this is where the similarity between the models ends - the generalized model uses the results of the TFIDF dictionary to generate similarity metrics based on distance or angle difference, rather than centroid based classification. Wong '''Wong et al.'''<ref name="wong">{{citecitation | titlechapter=Generalized vector spaces model in information retrieval | urlfirst1=http://doiS.acm K.org/10 M.1145/253495.253506 | firsttitle=S.Proceedings K.of M.the 8th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '85 | lastpages=18–25 | last1=Wong | coauthorsfirst2=Wojciech |last2=Ziarko, |first3=Patrick C. N. |last3=Wong | publisher=[[Association for Computing Machinery|SIGIR ACM]] | date=1985-06-05| doi=10.1145/253495.253506 | isbn=0897911598 | doi-access=free }}</ref> presented an analysis of the problems that the pairwise orthogonality assumption of the [[vector space model]] (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM).
 
==Definitions==
 
GVSM introduces a term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, the factor considered a new space, where each term vector ''t<sub>i</sub>'' was expressed as a linear combination of ''2<sup>n</sup>'' vectors ''m<sub>r</sub>'' where ''r = 1...2<sup>n</sup>''.
 
For a document ''d<sub>k</sub>'' and a query ''q'' the similarity function now becomes:
Line 20 ⟶ 21:
# compute frequency co-occurrence statistics from large corpora
 
Recently Tsatsaronis<ref>{{citecitation | title=A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness | url=http://www.aclweb.org/anthology/E/E09/E09-3009.pdf | lastlast1= Tsatsaronis | firstfirst1=George | coauthorsfirst2=Vicky | last2=Panagiotopoulou | publisher=[[Association for Computing Machinery|EACL ACM]] |date=2009-04-02}}</ref> focused on the first approach.
 
They measure semantic relatedness (''SR'') using a thesaurus (''O'') like [[WordNet]]. It considers the path length, captured by compactness (''SCM''), and the path depth, captured by semantic path elaboration (''SPE'').
Line 28 ⟶ 29:
 
where ''s<sub>i</sub>'' and ''s<sub>j</sub>'' are senses of terms ''t<sub>i</sub>'' and ''t<sub>j</sub>'' respectively, maximizing <math>SCM \cdot SPE</math>.
 
Building also on the first approach, Waitelonis et al.<ref>{{citation | title=Linked Data enabled Generalized Vector Space Model to improve document retrieval | url=http://ceur-ws.org/Vol-1581/paper4.pdf | last1= Waitelonis | first1=Jörg | first2=Claudia | last2=Exeler | last3=Sack| first3=Harald | publisher=ISWC 2015, CEUR-WS 1581 |date=2015-09-11}}</ref> have computed semantic relatedness from [[Linked data|Linked Open Data]] resources including [[DBpedia]] as well as the [[YAGO (database)|YAGO taxonomy]].
Thereby they exploits taxonomic relationships among semantic entities in documents and queries after [[Entity linking|named entity linking]].
 
== References ==