Generalized vector space model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:11, 9 August 2014 edit Tmdietrich (talk \| contribs) 1 edit No edit summary Tag: gettingstarted edit ← Previous edit		Latest revision as of 19:28, 29 January 2023 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: doi added to citation with #oabot.
(9 intermediate revisions by 9 users not shown)
Line 1: {{short description\|Generalization of the vector space model used in information retrieval}} {{Confusing\|date=January 2010}} The '''Generalized vector space model''' is a generalization of the [[vector space model]] used in [[information retrieval]]. Many classifiers, especially those which are related to document or text classification, use the TFIDF basis of VSM. However, this is where the similarity between the models ends - the generalized model uses the results of the TFIDF dictionary to generate similarity metrics based on distance or angle difference, rather than centroid based classification. Wong ''~~'Wong~~ et al.'''<ref name="wong">{{~~cite~~citation \| ~~title~~chapter=Generalized vector spaces model in information retrieval \| ~~url~~first1=~~http://doi~~S.~~acm~~ K.~~org/10~~ M.~~1145/253495.253506~~ \| ~~first~~title=S.Proceedings K.of M.the 8th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '85 \| ~~last~~pages=18–25 \| last1=Wong \| ~~coauthors~~first2=Wojciech \|last2=Ziarko, \|first3=Patrick C. N. \|last3=Wong \| publisher=[[Association for Computing Machinery\|SIGIR ACM]] \| date=1985-06-05\| doi=10.1145/253495.253506 \| isbn=0897911598 \| doi-access=free }}</ref> presented an analysis of the problems that the pairwise orthogonality assumption of the [[vector space model]] (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM). ==Definitions== GVSM introduces a term to term correlations, which deprecate the pairwise orthogonality assumption. More specifically, the factor considered a new space, where each term vector ''t<sub>i</sub>'' was expressed as a linear combination of ''2<sup>n</sup>'' vectors ''m<sub>r</sub>'' where ''r = 1...2<sup>n</sup>''. For a document ''d<sub>k</sub>'' and a query ''q'' the similarity function now becomes: Line 20 ⟶ 21: # compute frequency co-occurrence statistics from large corpora Recently Tsatsaronis<ref>{{~~cite~~citation \| title=A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness \| url=http://www.aclweb.org/anthology/E/E09/E09-3009.pdf \| ~~last~~last1= Tsatsaronis \| ~~first~~first1=George \| ~~coauthors~~first2=Vicky \| last2=Panagiotopoulou \| publisher=[[Association for Computing Machinery\|EACL ACM]] \|date=2009-04-02}}</ref> focused on the first approach. They measure semantic relatedness (''SR'') using a thesaurus (''O'') like [[WordNet]]. It considers the path length, captured by compactness (''SCM''), and the path depth, captured by semantic path elaboration (''SPE''). Line 28 ⟶ 29: where ''s<sub>i</sub>'' and ''s<sub>j</sub>'' are senses of terms ''t<sub>i</sub>'' and ''t<sub>j</sub>'' respectively, maximizing <math>SCM \cdot SPE</math>. Building also on the first approach, Waitelonis et al.<ref>{{citation \| title=Linked Data enabled Generalized Vector Space Model to improve document retrieval \| url=http://ceur-ws.org/Vol-1581/paper4.pdf \| last1= Waitelonis \| first1=Jörg \| first2=Claudia \| last2=Exeler \| last3=Sack\| first3=Harald \| publisher=ISWC 2015, CEUR-WS 1581 \|date=2015-09-11}}</ref> have computed semantic relatedness from [[Linked data\|Linked Open Data]] resources including [[DBpedia]] as well as the [[YAGO (database)\|YAGO taxonomy]]. Thereby they exploits taxonomic relationships among semantic entities in documents and queries after [[Entity linking\|named entity linking]]. == References ==