Content deleted Content added
No edit summary |
No edit summary Tags: Visual edit Mobile edit Mobile web edit |
||
(51 intermediate revisions by 38 users not shown) | |||
Line 1:
{{Short description|Natural language processing algorithm}}
Lesk, M. (1986). [http://portal.acm.org/citation.cfm?id=318728&dl=GUIDE,ACM&coll=GUIDE&CFID=103485667&CFTOKEN=64768709 Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone]. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA. ACM.
</ref> It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models.
==Overview==
The Lesk algorithm is based on the assumption that words in a given
</ref>
# for every sense of the word being disambiguated one should count the
# the sense that is to be chosen is the sense
PINE
1. kinds of evergreen tree with needle-shaped leaves
Line 19:
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees
As
==Simplified Lesk
In Simplified Lesk algorithm,<ref>Kilgarriff and J. Rosenzweig. 2000. [http://www.lrec-conf.org/proceedings/lrec2000/pdf/8.pdf English SENSEVAL:Report and Results]. In Proceedings of the 2nd International Conference on Language Resourcesand Evaluation, LREC, Athens, Greece.</ref> the correct meaning of each word in a given
"A comparative evaluation
2004. [http://www.lrec-conf.org/proceedings/lrec2004/pdf/219.pdf Evaluating Variants of the Lesk Approach for Disambiguating Words]. LREC, Portugal.</ref> has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words
Note:
'''Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004)'''<ref>Florentina Vasilescu, Philippe Langlais, and Guy Lapalme.
2004. [http://www.lrec-conf.org/proceedings/lrec2004/pdf/219.pdf Evaluating Variants of the Lesk Approach for Disambiguating Words]. LREC, Portugal.</ref>
{| class="wikitable"
|-
|
:
:''max-overlap <- 0''
:''context <- set of words in sentence ''
:
::''signature <- set of words in the gloss and examples of sense''
::''overlap'' <- COMPUTEOVERLAP (''signature,context'')
::
:::
|}
The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The
==Criticisms
Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a
==Lesk variants==
There are a lot of studies concerning Lesk and its extensions<ref>Roberto Navigli. [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf ''Word Sense Disambiguation: A Survey]'', ACM Computing Surveys, 41(2), 2009, pp. 1–69.</ref>:▼
* Original Lesk (Lesk, 1986)
* Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the [[Cosine similarity]] measure.<ref>{{Cite book|last1=Banerjee|first1=Satanjeev|last2=Pedersen|first2=Ted|title=Computational Linguistics and Intelligent Text Processing |chapter=An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet |date=2002-02-17|series=Lecture Notes in Computer Science|volume=2276 |language=en|publisher=Springer, Berlin, Heidelberg|pages=136–145|doi=10.1007/3-540-45715-1_11|isbn=978-3540457152|citeseerx=10.1.1.118.8359}}</ref>
▲There are a lot of studies concerning Lesk and its extensions:<ref>Roberto Navigli. [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf ''Word Sense Disambiguation: A Survey
* Kwong, 2001;▼
* Nastase and Szpakowicz, 2001;▼
* Wilks and Stevenson, 1998, 1999;
* Mahesh et al., 1997;
Line 63 ⟶ 66:
* Yarowsky, 1992;
* Pook and Catlett, 1988;
* Kilgarriff
▲* Kwong, 2001;
* Alexander Gelbukh, Grigori Sidorov, 2004.▼
▲* Nastase and Szpakowicz, 2001;
==
{{Commons}}
{{Portal|Linguistics}}
* [[Word-sense disambiguation]]
==
{{reflist|30em}}
[[Category:Natural language processing]]
[[Category:Semantics]]
[[Category:Computational linguistics]]
[[Category:Word
|