Lesk algorithm: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: title. Add: volume, chapter. Removed parameters. | Use this bot. Report bugs. | #UCB_CommandLine
move variants to variants section and simplify criticism heading
Line 49:
The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way.
 
==Criticisms and other Lesk-based methods==
Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions.
 
A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions.<ref>Alexander Gelbukh, Grigori Sidorov. [https://www.gelbukh.com/CV/Publications/2004/NTI-2004-senses.htm Automatic resolution of ambiguity of word senses in dictionary definitions] (in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15.</ref>
 
==Lesk variants==
* Original Lesk (Lesk, 1986)
* Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the [[Cosine similarity]] measure.<ref>{{Cite book|last1=Banerjee|first1=Satanjeev|last2=Pedersen|first2=Ted|title=Computational Linguistics and Intelligent Text Processing |chapter=An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet |date=2002-02-17|series=Lecture Notes in Computer Science|volume=2276 |language=en|publisher=Springer, Berlin, Heidelberg|pages=136–145|doi=10.1007/3-540-45715-1_11|isbn=978-3540457152|citeseerx=10.1.1.118.8359}}</ref>
 
There are a lot of studies concerning Lesk and its extensions:<ref>Roberto Navigli. [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf ''Word Sense Disambiguation: A Survey''], ACM Computing Surveys, 41(2), 2009, pp. 1–69.</ref>
Line 65 ⟶ 69:
* Nastase and Szpakowicz, 2001;
* Gelbukh and Sidorov, 2004.
 
==Lesk variants==
* Original Lesk (Lesk, 1986)
* Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the [[Cosine similarity]] measure.<ref>{{Cite book|last1=Banerjee|first1=Satanjeev|last2=Pedersen|first2=Ted|title=Computational Linguistics and Intelligent Text Processing |chapter=An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet |date=2002-02-17|series=Lecture Notes in Computer Science|volume=2276 |language=en|publisher=Springer, Berlin, Heidelberg|pages=136–145|doi=10.1007/3-540-45715-1_11|isbn=978-3540457152|citeseerx=10.1.1.118.8359}}</ref>
 
==See also==