Text segmentation: Difference between revisions

Content deleted Content added
Rei-bot (talk | contribs)
m robot Adding: pt:Análise morfológica
张开旭 (talk | contribs)
Line 7:
=== Word segmentation ===
 
Word segmentation is the problem of dividing a string of written language into its component [[word]]s. In English and many other modern languages using some form of the [[Latin alphabet]] dividing text using the [[Space (punctuation)|space character]] is a good approximation to word segmentation. (Some examples where the space character alone may not be sufficient include contractions like ''can't'' for ''can not''.) However the equivalent to this character is not found in all written scripts and without it word segmentation is a difficult problem. Languages which do not have a trivial word segmentation process include [[Chinese language|Chinese]], [[Japanese language|Japanese]] and [[Thai language|Thai]].
 
=== Sentence segmentation ===