Text segmentation: Difference between revisions

Content deleted Content added
Word segmentation: minor fixes, mostly disambig links using AWB
Line 28:
As with word segmentation, not all written languages contain punctuation characters which are useful for approximating sentence boundaries.
 
=== TextTopic segmentation ===
{{main|Topic analysis|Document classification}}
Topic analysis consists of two main tasks: topic identification and text segmentation. While the first is a simple [[machine learning|classification]] of a specific text, the latter case implies that a document may contain multiple topics, and the task of computerized text segmentation may be to discover these topics automatically and segment the text accordingly. The topic boundaries may be apparent from section titles and paragraphs. In other cases, one needs to use techniques similar to those used in [[document classification]].