Outline of natural language processing: Difference between revisions

Content deleted Content added
Chatterbots: linking to the main list of chatbots
mNo edit summary
Line 238:
* [[Truecasing]] –
* [[Word segmentation]] – separates a chunk of continuous text into separate words. For a language like [[English language|English]], this is fairly trivial, since words are usually separated by spaces. However, some written languages like [[Chinese language|Chinese]], [[Japanese language|Japanese]] and [[Thai language|Thai]] do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the [[vocabulary]] and [[morphology (linguistics)|morphology]] of words in the language.
* [[Word -sense disambiguation]] (WSD) – because many words have more than one [[Meaning (linguistics)|meaning]], word -sense disambiguation is used to select the meaning which makes the most sense in context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or from an online resource such as [[WordNet]].
** [[Word-sense induction]] – open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context.
** [[Automatic acquisition of sense-tagged corpora]] –