Unicode equivalence: Difference between revisions

Content deleted Content added
No edit summary
Undid revision 890185620 by 31.40.110.111 (talk)
Line 6:
Sequences that are defined as '''compatible''' are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the [[typographic ligature]] "ff") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as [[sorting]] and [[index (database)|index]]ing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true.
 
The standard also defines a [[text normalization]] procedure, called '''Unicode normalization''', that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the '''normalization form''' or '''normal form''' of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one '''fully composed''' (where multiple code points are replaced by single points whenever possible), and one '''fully decomposed''' (where single points are split into multiple ones). Each of these twofour normal forms can be used in text processing.
 
==Sources of equivalence==