Revision as of 01:00, 22 July 2015 edit BD2412 (talk \| contribs) Autopatrolled, Administrators 2,527,975 edits →Word segmentation: minor fixes, mostly disambig links using AWB ← Previous edit		Revision as of 07:46, 5 August 2015 edit undo 149.173.1.38 (talk) →Text segmentation Next edit →
Line 28: As with word segmentation, not all written languages contain punctuation characters which are useful for approximating sentence boundaries. === ~~Text~~Topic segmentation === {{main\|Topic analysis\|Document classification}} Topic analysis consists of two main tasks: topic identiﬁcation and text segmentation. While the first is a simple [[machine learning\|classification]] of a specific text, the latter case implies that a document may contain multiple topics, and the task of computerized text segmentation may be to discover these topics automatically and segment the text accordingly. The topic boundaries may be apparent from section titles and paragraphs. In other cases, one needs to use techniques similar to those used in [[document classification]].

Text segmentation: Difference between revisions