Content deleted Content added
Line 44:
Automatic segmentation is the problem in [[natural language processing]] of implementing a computer process to segment text.
When punctuation and similar clues are not consistently available — and even in languages where word and sentence boundaries are marked, other boundaries aren't — the segmentation task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints. Effective natural language processing systems and text segmentation tools usually operate on text in specific domains and sources. As an example, processing text used in medical records is a very different problem than processing news articles or real estate advertisements.
The process of developing text segmentation tools starts with collecting a large corpus of text in an application ___domain. There are two general approaches:
|