Unicode compatibility characters: Difference between revisions

Content deleted Content added
mNo edit summary
KolbertBot (talk | contribs)
m Bot: HTTP→HTTPS (v485)
Line 4:
}}
 
In [[Unicode]] and the [[Universal Character Set|UCS]], a '''compatibility character''' is a character that is encoded solely to maintain [[Round-trip format conversion|round trip convertibility]] with other, often older, standards.<ref>{{cite web|title=Chapter 2.3: Compatibility characters|url=httphttps://www.unicode.org/versions/Unicode6.0.0/ch02.pdf#G11062|work=The Unicode Standard 6.0.0}}</ref> As the Unicode Glossary says:
 
<blockquote>
A character that would not have been encoded except for compatibility and round-trip convertibility with other standards<ref>[httphttps://www.unicode.org/glossary/#compatibility_character Unicode consortium Unicode Glossary]</ref>
</blockquote>
 
Line 25:
;[[typographic ligature|Ligatures]]: Ligatures such as ‘ffi’ in the Latin script were often encoded as a separate character in legacy character sets. Unicode’s approach to ligatures is to treat them as rich text and, if turned on, handled through glyph substitution.
;Precomposed Roman numerals: For example, Roman numeral twelve (‘Ⅻ’: U+216B) can be decomposed into a Roman numeral ten (‘Ⅹ’: U+2169) and two Roman numeral ones (‘Ⅰ’: U+2160).
;Precomposed [[vulgar fraction|fractions]]: These decomposition have the keyword &lt;fraction&gt;. A fully conforming text handler should<ref>{{cite web|author=The Unicode Consortium|authorlink=Unicode Consortium|year=2010|title=The Unicode Standard, Version 6.0.0|publisher=Addison-Wesley Professional|isbn=978-0321480910|pages=212|url=httphttps://www.unicode.org/versions/Unicode6.0.0/ch06.pdf#G12861}}</ref> display the vulgar fraction ¼ (U+00BC) identically to the composed fraction 1⁄4 (numeral 1 with fraction slash U+2044 and numeral 4).
;Contextual glyphs or forms: These arise primarily in the Arabic script. Using fonts with glyph substitution capabilities such as [[OpenType]] and [[Apple Advanced Typography|TrueTypeGX]], Unicode conforming software can substitute the proper glyphs for the same character depending on whether that character appears at the beginning, end, middle of a word or in isolation. Such glyph substitution is also necessary for vertical (top to bottom) text layout for some East Asian languages. In this case glyphs must be substituted or synthesized for wide, narrow, small and square glyph forms. Non-conforming software or software using other character sets instead use multiple separate character for the same letter depending on its position: further complicating text processing.
 
Line 118:
# “Ornate Left Parenthesis” (U+FD3E): ﴾. A glyph variant for U+0029 ‘)’
# “Ornate Right Parenthesis” (U+FD3F): ﴿. A glyph variant for U+0028 ‘ (’
# “Ligature Bismillah Ar-Rahman Ar-Raheem” (U+FDFD): ﷽. [[Bismillah ar-Rahman, ar-Raheem|Bismillah Ar-Rahman Ar-Raheem]] is a ligature for Beh (U+0628), Seen (U+0633), Meem (U+0645), Space (U+0020), Alef (U+0627), Lam (U+0644), Lam (U+0644), Heh (U+0647), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Meem (U+0645), Alef (U+0627), Noon (U+0646), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Yeh (U+064A), Meem (U+0645) i.e. {{rtl-lang|ar|بسم الله الرحمان الرحيم}} <ref>[httphttps://www.unicode.org/charts/PDF/UFB50.pdf Unicode chart FB50-FDFF (PDF)].</ref><!-- Note: In Unicode, characters are written in "logical" sequence, i.e. from right to left in RTL languages such as Arabic. --> (Similarly, U+FDFA and U+FDFB code for two other Arabic ligatures, of 21 and 9 characters respectively.)
# “Arabic Tail Fragment” (U+FE73): ﹳ for supporting text systems without contextual glyph handling