Lesk algorithm

This is an old revision of this page, as edited by Robykiwi~enwiki (talk | contribs) at 09:33, 27 June 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986. [1]

Overview

The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood. Versions have been adapted to WordNet[2]. It would be like this:

  1. for every sense of the word being disambiguated one should count the amount of words that are in both neighbourhood of that word and in the definition of each sense in a dictionary
  2. the sense that is to be chosen is the sense which has the biggest number of this count

Quite common example of working of this algorithm is for context "PINE CONE".

PINE 
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness
CONE 
1. solid body which narrows to a point
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees

As you can see the best intersection is Pine#1 ⋂ Cone#3 = 2.

Criticisms and other Lesk-based methods

Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions.

Recently, a lot of works appeared which offer different modifications of this algorithm. These works uses other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions[3].

There are a lot of studies concerning Lesk and its extensions[4]:

  • Kwong, 2001;
  • Nastase and Szpakowicz, 2001;
  • Wilks and Stevenson, 1998, 1999;
  • Mahesh et al., 1997;
  • Cowie et al., 1992;
  • Yarowsky, 1992;
  • Pook and Catlett, 1988;
  • Kilgarriff & Rosensweig, 2000,
  • Alexander Gelbukh, Grigori Sidorov, 2004.

Accuracy

The original method achieved 50–70% accuracy (depending on the word) on Pride and Prejudice and selected papers of the Associated Press.

References

  1. ^ Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, Michael Lesk, ACM Special Interest Group for Design of Communication Proceedings of the 5th annual international conference on Systems documentation, p. 24 - 26, 1986. ISBN 0897912241 [1]
  2. ^ Satanjeev Banerjee and Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet, Lecture Notes In Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3540432191
  3. ^ Alexander Gelbukh, Grigori Sidorov. Automatic resolution of ambiguity of word senses in dictionary definitions (in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15.
  4. ^ Roberto Navigli. Word Sense Disambiguation: A Survey, ACM Computing Surveys, 41(2), 2009, pp. 1–69.