Content deleted Content added
No edit summary |
m Grammar. |
||
Line 90:
Unicode assigns each character a '''combining class''', which is identified by a numerical value. Non-combining characters have class number 0, while combining characters have a positive combining class value. To obtain the canonical ordering, every substring of characters having non-zero combining class value must be sorted by the combining class value using a [[Sorting algorithm#Stability|stable sorting]] algorithm. Stable sorting is required because combining characters with the same class value are assumed to interact typographically, thus the two possible orders are ''not'' considered equivalent.
For example, the character U+1EBF (ế), used in [[Vietnamese alphabet|Vietnamese]], has both an acute and a circumflex accent. Its canonical decomposition is the three-character sequence U+0065 (e) U+0302 (circumflex accent) U+0301 (acute accent). The combining classes for the two accents are both 230, thus U+1EBF is not equivalent
Since not all combining sequences have a precomposed equivalent (the last one in the previous example can only be reduced to U+00E9 U+0302), even the normal form NFC is affected by combining characters' behavior.
|