Lesk algorithm: Difference between revisions

Content deleted Content added
No edit summary
added Simplified Lesk Algorithm
Line 1:
The '''Lesk algorithm''' is a classical algorithm for [[word sense disambiguation]] introduced by [[Mike Lesk|Michael E. Lesk]] in 1986.
<ref>
''Lesk, M. (1986). [http://portal.acm.org/ft_gatewaycitation.cfm?id=318728&typedl=pdfGUIDE,ACM&dlcoll=GUIDE&dlCFID=ACM103485667&CFTOKEN=64768709 Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone]'',. [[MikeIn Lesk|MichaelSIGDOC Lesk]], ACM Special Interest Group for Design of Communication'86: Proceedings of the 5th annual international conference on Systems documentation, p.pages 24 - 26, 1986.New ISBNYork, 0897912241NY, [http://portalUSA.acm ACM.org/citation.cfm?id=318728]
</ref>
 
Line 20:
3. fruit of certain evergreen trees
As you can see the best intersection is Pine#1 ⋂ Cone#3 = 2.
 
==Simplified Lesk Algorithm==
 
In Simplified Lesk algorithm, the correct meaning of each word in a text is determined individually by finding the sense that leads to the highest overlap between its dictionary definition and the current context. Rather than seeking to simultaneously determine the meaning of all words in a given text, this approach tackles each word individually, regardless of the meaning of the other words occuring in the same context.
 
A comparative evaluation perfomed by Vasileseu et al. (2004)<ref>Florentina Vasilescu, Philippe Langlais, and Guy Lapalme.
2004. Evaluating Variants of the Lesk Approach for Disambiguating Words. LREC, Portugal.</ref> has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all owrds, data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm.
 
Note that their implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet.
 
<b>Simplified Lesk Algorithm (Kilgarriff and Rosenzweig, 2000)</b><ref> Kilgarriff and J. Rosenzweig. 2000. English SENSEVAL:Report and Results. In Proceedings of the 2nd International Conference on Language Resourcesand Evaluation, LREC, Athens, Greece.</ref>
{| class="wikitable"
|-
| <b>function</b> SIMPLIFIED LESK(<i>word,sentence</i>)<b> returns</b> <i>best sense of word </i>
:<i>best-sense <- most frequent sense for word </i>
:''max-overlap <- 0''
:''context <- set of words in sentence ''
:<b>for each</b><i> sense </i><b>in</b> <i>senses of word </i><b>do</b>
::''signature <- set of words in the gloss and examples of sense''
::''overlap'' <- COMPUTEOVERLAP (''signature,context'')
::<b>if</b> ''overlap > max-overlap'' <b> then</b>
::::''max-overlap <- overlap
::::best-sense <- sense''
<b> end </b>
<b> return </b>(''best-sense'')
|}
The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The orignial Lesk algorithm defines the context in a more complex way.
 
==Criticisms and other Lesk-based methods==