Unicode equivalence: Difference between revisions

Content deleted Content added
Typographic conventions: Mention error conversion as it is a form of normalization too
Tree4rest (talk | contribs)
Broke off last sentence of intro as it summarizes both proceeding paragraphs, and directed readers to the 2.1 Normal forms section for clarification of how this results in for total forms.
Line 6:
Sequences that are defined as '''compatible''' are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the [[typographic ligature]] "ff") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as [[sorting]] and [[index (database)|index]]ing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true.
 
The standard also defines a [[text normalization]] procedure, called '''Unicode normalization''', that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the '''normalization form''' or '''normal form''' of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one '''fully composed''' (where multiple code points are replaced by single points whenever possible), and one '''fully decomposed''' (where single points are split into multiple ones).

These traits Eachcan ofbe thesecombined into the four normal forms, as explained below, any of which can be used in text processing.
 
==Sources of equivalence==