Revision as of 19:42, 23 July 2019 edit Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Typographic conventions: Mention error conversion as it is a form of normalization too ← Previous edit		Revision as of 07:35, 3 September 2019 edit undo Tree4rest (talk \| contribs) 139 edits Broke off last sentence of intro as it summarizes both proceeding paragraphs, and directed readers to the 2.1 Normal forms section for clarification of how this results in for total forms. Next edit →
Line 6: Sequences that are defined as '''compatible''' are assumed to have possibly distinct appearances, but the same meaning in some contexts. Thus, for example, the code point U+FB00 (the [[typographic ligature]] "ﬀ") is defined to be compatible—but not canonically equivalent—to the sequence U+0066 U+0066 (two Latin "f" letters). Compatible sequences may be treated the same way in some applications (such as [[sorting]] and [[index (database)\|index]]ing), but not in others; and may be substituted for each other in some situations, but not in others. Sequences that are canonically equivalent are also compatible, but the opposite is not necessarily true. The standard also defines a [[text normalization]] procedure, called '''Unicode normalization''', that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the '''normalization form''' or '''normal form''' of the original text. For each of the two equivalence notions, Unicode defines two normal forms, one '''fully composed''' (where multiple code points are replaced by single points whenever possible), and one '''fully decomposed''' (where single points are split into multiple ones). These traits ~~Each~~can ofbe ~~these~~combined into the four normal forms, as explained below, any of which can be used in text processing. ==Sources of equivalence==

Unicode equivalence: Difference between revisions