The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986. [1]
The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A naive implementation of the The Lesk algorithm would be
- choosing pairs of ambiguous words within a neighbourhood
- checks their definitions in a dictionary
- choose the senses as to maximise the number of common terms in the definitions of the chosen words.
Accuracy on Pride and Prejudice and selected papers of the Associated Press was found to be in the 50% to 70% range.
A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood.
References
- ^ Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, Michael Lesk, ACM Special Interest Group for Design of Communication Proceedings of the 5th annual international conference on Systems documentation, p. 24 - 26, 1986. ISBN 0897912241 [1]
- ^ An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet, Satanjeev Banerjee and Ted Pedersen, Lecture Notes In Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3540432191