Content deleted Content added
Re-order and clarify extensively |
|||
Line 9:
=== Word segmentation ===
Word segmentation is the problem of dividing a string of written language into its component [[word]]s. In English and many other modern languages using some form of the [[Latin alphabet]] dividing text using the [[Space (punctuation)|space character]] is a good approximation to word segmentation. (Some examples where the space character alone may not be sufficient include contractions like ''can't'' for ''can not''.) However the equivalent to this character is not found in all written scripts and without it word segmentation is a difficult problem. Languages which do not have a trivial word segmentation process include [[Thai language|Thai]].
=== Sentence segmentation ===
|