Unicode equivalence: Difference between revisions

Content deleted Content added
Character duplication: Duplicate characters in Unicode
m Normal forms: expand contraction
Line 82:
The normal forms are not [[closure (mathematics)|closed]] under string [[concatenation]].<ref> Per [http://www.unicode.org/faq/normalization.html#5 What should be done about concatenation]</ref> For defective Unicode strings starting with a Hangul vowel or trailing [[Hangul Jamo (Unicode block)|conjoining jamo]], concatenation can break Composition.
 
However, they are not [[injective function|injective]] (they map different original glyphs and sequences to the same normalized sequence) and thus also not [[bijection|bijective]] (can'tcannot be restored). For example, the distinct Unicode strings "U+212B" (the angstrom sign "Å") and "U+00C5" (the Swedish letter "Å") are both expanded by NFD (or NFKD) into the sequence "U+0041 U+030A" (Latin letter "A" and combining [[ring above]] "°") which is then reduced by NFC (or NFKC) to "U+00C5" (the Swedish letter "Å").
 
A single character (other than a Hangul syllable block) that will get replaced by another under normalization can be identified in the Unicode tables for having a non-empty compatibility field but lacking a compatibility tag.