Lesk algorithm: Difference between revisions

Content deleted Content added
Cydebot (talk | contribs)
m Robot - Moving category WSD to Word sense disambiguation per CFD at Wikipedia:Categories for discussion/Log/2010 February 4.
Soshial (talk | contribs)
additions
Line 4:
</ref>
 
==Overview==
The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A naive implementation of the The Lesk algorithm would be
The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood. Versions have been adapted to [[WordNet]]<ref>''[http://www.cs.cmu.edu/~banerjee/Publications/cicling2002.ps.gz An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet]'', [[Satanjeev Banerjee]] and [[Ted Pedersen]], Lecture Notes In Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3540432191
# choosing pairs of ambiguous words within a neighbourhood
</ref>. It would be like this:
# checks their definitions in a dictionary
# for every sense of the word being disambiguated one should count the amount of words that are in both neighbourhood of that word and in the definition of each sense in a dictionary
# choose the senses as to maximise the number of common terms in the definitions of the chosen words.
# the sense that is to be chosen is the sense which has the biggest number of this count
 
Quite common '''example''' of working of this algorithm is for context "PINE CONE".
PINE
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness
 
CONE
1. solid body which narrows to a point
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees
As you can see the best intersection is Pine#1 ⋂ Cone#3 = 2.
 
Recently, a lot of works appeared which offer different modifications of this algorithm. These works uses other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntaxical models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions<ref>Alexander Gelbukh, Grigori Sidorov. Automatic resolution of ambiguity of word senses in dictionary definitions (in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15.</ref>.
 
There were a lot of studying of this method:
* Kwong, 2001;
* Nastase and Szpakowicz, 2001;
* Wilks and Stevenson, 1998, 1999;
* Mahesh et al., 1997;
* Cowie et al., 1992;
* Yarowsky, 1992;
* Pook and Catlett, 1988;
* Kilgarriff & Rosensweig, 2000,
* Alexander Gelbukh, Grigori Sidorov, 2004.
 
 
==Accuracy==
Accuracy on ''[[Pride and Prejudice]]'' and selected papers of the [[Associated Press]] was found to be in the 50% to 70% range.
 
Senseval results.
A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood. Versions have been adapted to [[WordNet]].<ref>
''[http://www.cs.cmu.edu/~banerjee/Publications/cicling2002.ps.gz An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet]'', [[Satanjeev Banerjee]] and [[Ted Pedersen]], Lecture Notes In Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3540432191
</ref>
 
== References ==