Content deleted Content added
(7 intermediate revisions by 6 users not shown) | |||
Line 3:
{{Refimprove|date=October 2011}}
'''Text segmentation''' is the process of dividing written text into meaningful units, such as words, [[
Compare [[speech segmentation]], the process of dividing speech into linguistically meaningful portions.
Line 25:
Some scholars have suggested that modern Chinese should be written in word segmentation, with
spaces between words like written English.
=== Intent segmentation ===
Line 48 ⟶ 47:
Segmenting the text into [[topic (linguistics)|topic]]s or [[discourse]] turns might be useful in some natural processing tasks: it can improve [[information retrieval]] or [[speech recognition]] significantly (by indexing/recognizing documents more precisely or by giving the specific part of a document corresponding to the query as a result). It is also needed in [[topic detection]] and tracking systems and [[text summarization|text summarizing]] problems.
Many different approaches have been tried:<ref>{{
|
| year = 2000
| title = Advances in ___domain independent linear text segmentation
| book-title = Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (ANLP-NAACL-00)
|
| arxiv=cs/0003083
| access-date = 2025-03-31
▲ | url = http://www.aclweb.org/anthology/A00-2004
| last = Reynar |
| year = 1998
| url = https://repository.upenn.edu/handle/20.500.14332/37673
| format = PDF
| publisher = [[University of Pennsylvania]]▼
}}</ref> e.g. [[
It is quite an ambiguous task – people evaluating the text segmentation systems often differ in topic boundaries. Hence, text segment evaluation is also a challenging problem.
<!-- <math> WindowDiff(ref,hyp) {{=}} 1 \over{N-k} \sum |b(ref_i,ref_{i+k})-b(hyp_i,hyp_{i+k})|</math> -->
Line 92 ⟶ 93:
* [[Word count]]
* [[Line wrap and word wrap|Line breaking]]
* [[Image segmentation]]
{{Natural Language Processing}}▼
== References ==
{{Reflist}}
▲{{Natural Language Processing}}
[[Category:Tasks of natural language processing]]
|