Semantic relatedness

This is an old revision of this page, as edited by Vdannyv (talk | contribs) at 16:16, 6 December 2006 (References). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Computational Measures of Semantic Relatedness include:

  • Latent semantic analysis (+) vector-based, adds vectors to measure multi-word terms; (-) non-incremental vocabulary, long pre-processing times
  • Pointwise Mutual Information (+) large vocab, because it uses any search engine (like Google); (-) cannot measure relatedness between whole sentences or documents
  • GLSA (+) vector-based, adds vectors to measure multi-word terms; (-) non-incremental vocabulary, long pre-processing times
  • ICAN (+) incremental, network-based measure, good for spreading activation, accounts for second-order relatedness; (-) cannot measure relatedness between multi-word terms, long pre-processing times
  • NGD (+) large vocab, because it uses any search engine (like Google); (-) cannot measure relatedness between whole sentences or documents
  • WordNet: (+) humanly constructed; (-) humanly constructed (not automatically learned), cannot measure relatedness between multi-word term, non-incremental vocabulary

References

  • Five Papers on WordNet by Miller, George A., Christiane Fellbaum, Katherine J. Miller. August, 1993, retrieved May 4, 2005
  • "The Latent Semantic Indexing home page".
  • Thomas Landauer, P. W. Foltz, & D. Laham (1998). "Introduction to Latent Semantic Analysis" (PDF). Discourse Processes. 25: 259–284.{{cite journal}}: CS1 maint: multiple names: authors list (link)


Categories