Revision as of 21:02, 4 September 2006 edit Whomp (talk \| contribs) Extended confirmed users 4,561 edits m Reverted edits by 217.35.76.135 (talk) to version 63933545 by Alaibot using VP ← Previous edit		Revision as of 23:30, 14 March 2007 edit undo Jausel (talk \| contribs) 36 edits mNo edit summary Next edit →
Line 1: '''Written text segmentation''' is the process of dividing written text into [[word]]s or other similar meaningful units. The term applies to [[human mind\|mental]] processes used by humans when reading text, and to artificial processes implemented in [[computers]], which are the subject of [[natural language processing]]. The problem is relatively trivial for written languages that have explicit word ~~boudary~~boundary markers, such as the word spaces of written [[English language\|English]] or the distinctive initial, medial and final letter shapes of [[Arabic language\|Arabic]]. When such clues are not consistently available, the task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints. ==See also==

Text segmentation: Difference between revisions