Revision as of 10:38, 3 August 2012 edit 131.211.188.71 (talk) reference did not match pseudocode. Pseudocode referred to 2004 paper, not the 2000 original simplified LESK paper given that no TF-IDF scores were used and a default word sense was introduced for 0 overlap which is only mentioned in the 2004 paper. ← Previous edit		Revision as of 17:12, 5 December 2012 edit undo Yobot (talk \| contribs) Bots 4,733,870 edits m clean up, References after punctuation per WP:REFPUNC and WP:PAIC, removed stub tag using AWB (8748) Next edit →
Line 5: ==Overview== The Lesk algorithm is based on the assumption that words in a given neighbourhood will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained of the neighbourhood. Versions have been adapted to [[WordNet]].<ref>Satanjeev Banerjee and Ted Pedersen. ''[http://www.cs.cmu.edu/~banerjee/Publications/cicling2002.ps.gz An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet]'', Lecture Notes In Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3-540-43219-1 </ref>. It would be like this: # for every sense of the word being disambiguated one should count the amount of words that are in both neighbourhood of that word and in the definition of each sense in a dictionary # the sense that is to be chosen is the sense which has the biggest number of this count Line 23: ==Simplified Lesk algorithm== In Simplified Lesk algorithm,<ref>Kilgarriff and J. Rosenzweig. 2000. English SENSEVAL:Report and Results. In Proceedings of the 2nd International Conference on Language Resourcesand Evaluation, LREC, Athens, Greece.</ref>, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context. "A comparative evaluation performed by Vasileseu et al. (2004)<ref>Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. Line 53: Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a signiﬁcant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate ﬁne-grained sense distinctions. Recently, a lot of works appeared which offer different modifications of this algorithm. These works uses other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions.<ref>Alexander Gelbukh, Grigori Sidorov. Automatic resolution of ambiguity of word senses in dictionary definitions (in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15.</ref>. There are a lot of studies concerning Lesk and its extensions<ref>Roberto Navigli. [http://www.dsi.uniroma1.it/~navigli/pubs/ACM_Survey_2009_Navigli.pdf ''Word Sense Disambiguation: A Survey]'', ACM Computing Surveys, 41(2), 2009, pp. 1–69.</ref>: Line 77: [[Category:Computational linguistics]] [[Category:Word-sense disambiguation]] ~~{{ling-stub}}~~ [[hy:Լեսկի ալգորիթմ]]

Lesk algorithm: Difference between revisions