Revision as of 19:38, 9 December 2021 edit Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,084 edits →Chatterbots: linking to the main list of chatbots ← Previous edit		Revision as of 19:08, 2 May 2022 edit undo Comp.arch (talk \| contribs) Extended confirmed users 41,478 edits mNo edit summary Tag: 2017 wikitext editor Next edit →
Line 238: * [[Truecasing]] – * [[Word segmentation]] – separates a chunk of continuous text into separate words. For a language like [[English language\|English]], this is fairly trivial, since words are usually separated by spaces. However, some written languages like [[Chinese language\|Chinese]], [[Japanese language\|Japanese]] and [[Thai language\|Thai]] do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the [[vocabulary]] and [[morphology (linguistics)\|morphology]] of words in the language. * [[Word -sense disambiguation]] (WSD) – because many words have more than one [[Meaning (linguistics)\|meaning]], word -sense disambiguation is used to select the meaning which makes the most sense in context. For this problem, we are typically given a list of words and associated word senses, e.g. from a dictionary or from an online resource such as [[WordNet]]. [[Word-sense induction]] – open problem of natural language processing, which concerns the automatic identification of the senses of a word (i.e. meanings). Given that the output of word-sense induction is a set of senses for the target word (sense inventory), this task is strictly related to that of word-sense disambiguation (WSD), which relies on a predefined sense inventory and aims to solve the ambiguity of words in context. [[Automatic acquisition of sense-tagged corpora]] –

Outline of natural language processing: Difference between revisions