Content deleted Content added
m fix link to wikipedia page for lexical chain |
|||
Line 33:
=== Sentence segmentation ===
{{See also|Sentence boundary disambiguation}}
Sentence segmentation is the problem of dividing a string of written language into its component [[Sentence (linguistics)|sentences]]. In English and some other languages, using punctuation, particularly the [[full stop]]/period character is a reasonable approximation. However even in English this problem is not trivial due to the use of the full stop character for abbreviations, which may or may not also terminate a sentence. For example, ''Mr.'' is not its own sentence in "''Mr. Smith went to the shops in Jones Street."'' When processing plain text, tables of abbreviations that contain periods can help prevent incorrect assignment of sentence boundaries.
As with word segmentation, not all written languages contain punctuation characters that are useful for approximating sentence boundaries.
|