Revision as of 08:17, 25 March 2016 edit Tony1 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Template editors 279,034 edits Script-assisted fixes: per MOS:NUM, MOS:CAPS, MOS:LINK ← Previous edit		Revision as of 20:36, 16 October 2016 edit undo Metasyn (talk \| contribs) 9 edits m →Topic segmentation: adding the link to topic modeling Tag: Visual edit Next edit →
Line 31: === Topic segmentation === {{main\|Topic analysis\|Document classification}} Topic analysis consists of two main tasks: topic identiﬁcation and text segmentation. While the first is a simple [[machine learning\|classification]] of a specific text, the latter case implies that a document may contain multiple topics, and the task of computerized text segmentation may be to discover these topics automatically and segment the text accordingly. The topic boundaries may be apparent from section titles and paragraphs. In other cases, one needs to use techniques similar to those used in [[document classification]]. Segmenting the text into [[topic (linguistics)\|topic]]s or [[discourse]] turns might be useful in some natural processing tasks: it can improve information retrieval or speech recognition significantly (by indexing/recognizing documents more precisely or by giving the specific part of a document corresponding to the query as a result). It is also needed in [[topic detection]] and tracking systems and [[text summarization\|text summarizing]] problems. Line 51: \| format = PDF \| accessdate = 2007-11-08 }}</ref> e.g. [[Hidden Markov model\|HMM]], [[lexical chains]], passage similarity using word [[co-occurrence]], [[cluster analysis\|clustering]], [[topic modeling]], etc. It is quite an ambiguous task – people evaluating the text segmentation systems often differ in topic boundaries. Hence, text segment evaluation is also a challenging problem.

Text segmentation: Difference between revisions