Content deleted Content added
Undid revision 895070424 by 89.211.117.124 (talk) |
|||
Line 24:
=== Intent segmentation ===
{{See also|Tri-box method}}{{Confusing section|date=2019-09-06}}
Intent segmentation is the problem of dividing written words into keyphrases (2 or more group of words).
Line 35:
Sentence segmentation is the problem of dividing a string of written language into its component [[sentences]]. In English and some other languages, using punctuation, particularly the [[full stop]]/period character is a reasonable approximation. However even in English this problem is not trivial due to the use of the full stop character for abbreviations, which may or may not also terminate a sentence. For example, ''Mr.'' is not its own sentence in "''Mr. Smith went to the shops in Jones Street."'' When processing plain text, tables of abbreviations that contain periods can help prevent incorrect assignment of sentence boundaries.
As with word segmentation, not all written languages contain punctuation characters
=== Topic segmentation ===
|