Text segmentation: Difference between revisions

Content deleted Content added
m moved Word segmentation to Text segmentation: "Word segmentation" suggests the segmentation of words into morphemes, which is not what the article was meant to be about. "Text segmentation" may not be ideal either, but...
Scode (talk | contribs)
Correct a couple of typos
Line 1:
'''Written text segmentation''' is the process of diving written text into [[word]]s or other similar meaningful units. The term applies to [[human mind|mental]] processes used by humans when reading text, and to artificial processes implemented in [[computers]]s, which are the subject [[natural language processing]].
 
The problem is relatively trivial for written languages that have explicit word boudary markers, such as the word spaces of written [[English language|English]] ofor the distinctive initial, medial and final letter shapes of [[Arabic language|Arabic]]. When such clues are not consistently available, the task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints.
 
==See also==