→Word segmentation: Bad example ("won't" is a single word synonymous with "will not", not an orthographic representation of "will not"); replaced with good example (tool box/toolbox, ice box/icebox).
Word segmentation is the problem of dividing a string of written language into its component words.
In English and many other languages using some form of the [[Latin alphabet]], the [[Space (punctuation)|space]] is a good approximation of a [[word divider]] (word [[delimiter]])., (Somealthough examplesthis whereconcept has limits because of the spacevariability characterwith alonewhich maylanguages not[[emic beand sufficientetic|emically]] includeregard contractions[[collocation]]s likeand [[compound (linguistics)|compounds]]. Many [[English compound#Compound nouns|English compound nouns]] are variably written (for example, ''won't[[icebox|ice box = ice-box = icebox]]'' for; ''will[[sty|pig sty not= pig-sty = pigsty]]''.) with a corresponding variation in whether speakers think of them as [[noun phrase]]s or single nouns; there are trends in how norms are set, such as that open compounds often tend eventually to solidify by widespread convention, but variation remains systemic. In contrast, [[German nouns#Compounds|German compound nouns]] show less orthographic variation, with solidification being a stronger norm.
However, the equivalent to thisthe word space character is not found in all written scripts, and without it word segmentation is a difficult problem. Languages which do not have a trivial word segmentation process include Chinese, Japanese, where [[sentences]] but not words are delimited, [[Thai language|Thai]] and [[Lao language|Lao]], where phrases and sentences but not words are delimited, and [[Vietnamese language|Vietnamese]], where syllables but not words are delimited.
In some writing systems however, such as the [[Ge'ez script]] used for [[Amharic]] and [[Tigrinya language|Tigrinya]] among other languages, words are explicitly delimited (at least historically) with a non-whitespace character.