Text segmentation: Difference between revisions

Content deleted Content added
SmackBot (talk | contribs)
m Standard headings/general fixes
No edit summary
Line 1:
'''Text segmentation''' is the process of dividing written text into [[word]]s or other similar meaningful units, such as [[sentence]]s or [[topic]]s. The term applies to [[human mind|mental]] processes used by humans when reading text, and to artificial processes implemented in [[computers]], which are the subject of [[natural language processing]].
 
The problem may appear relatively trivial for written languages that have explicit word boundary markers, such as the word spaces of written [[English language|English]] or the distinctive initial, medial and final letter shapes of [[Arabic language|Arabic]]. When such clues are not consistently available, the task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints.