Revision as of 07:27, 22 September 2008 edit Querent (talk \| contribs) Extended confirmed users 2,444 edits →Automatic segmentation approaches ← Previous edit		Revision as of 07:38, 22 September 2008 edit undo Querent (talk \| contribs) Extended confirmed users 2,444 edits →Automatic segmentation approaches Next edit →
Line 44: Automatic segmentation is the problem in [[natural language processing]] of implementing a computer process to segment text. When punctuation and similar clues are not consistently available — and even in languages where word and sentence boundaries are marked, other boundaries aren't — the segmentation task often requires fairly non-trivial techniques, such as statistical decision-making, large dictionaries, as well as consideration of syntactic and semantic constraints. Effective natural language processing systems and text segmentation tools usually operate on text in specific domains and sources. As an example, processing text used in medical records is a very different problem than processing news articles or real estate advertisements. The process of developing text segmentation tools starts with collecting a large corpus of text in an application ___domain. There are two general approaches:

Text segmentation: Difference between revisions