Text normalization: Difference between revisions

Content deleted Content added
m clean up using AWB
No edit summary
Line 10:
 
While this may be done manually, and usually is in the case of ad hoc and personal documents, many [[programming language]]s support mechanisms which enable text normalization.
 
The text normalization is useful, for example, for comparing two sequence of characters which mean the same but are represented differently. The examples of this kind of normalization include, but not limited to, "don't" vs "do not", "I'm" vs "I am", "Can't" vs "Cannot".
 
Further, "1" and "one" are same, "1st" is same as "first", and so on. Instead of treating these strings as different, through text processing, one can treat them as same.
 
[[Category:Unicode]]