Content deleted Content added
m Replaced the 3 characters “ffi” with “ffi” (the ligature itself) →Glyph substitution and composition |
m Dating maintenance tags: {{Citation needed}} |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 76:
In addition, several scripts use glyph position such as superscripts and subscripts to differentiate semantics. In these cases subscripts and superscripts are not merely rich text, but constitute a distinct character in the writing system (130 total).
* 112 characters representing abstract phonemes from phonetic alphabets such as the
* 14 characters from the [[Kanbun]] block (U+3192 – U+319F)
* 1 character from the [[Tifinagh]] script: Tifinagh Modifier Letter Labialization Mark (ⵯ U+2D6F)
* 1 character from the [[Georgian script]]: Modifier Letter Georgian Nar (ჼ U+10FC)
* masculine ([[º|U+00BA]]) and feminine ([[ª|U+00AA]]) ordinal indicators included in the [[Latin-1
Finally, Unicode designates Roman numerals as compatibility equivalence to the Latin letters that share the same glyphs.{{Citation needed|date=November 2015}}
Line 112:
These thirteen characters are not compatibility characters, and their use is not discouraged in any way. However, U+27EAF 𧺯, the same as U+FA23 﨣, is mistakenly encoded in CJK Unified Ideographs Extension B.<ref>[http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg26/IRGN1218_Response_to_WG2.pdf#page=4 IRGN 1218]</ref> In any event, a normalized text should never contain both U+27EAF 𧺯 and U+FA23 﨣; these code points represent the same character, encoded twice.
Several other characters in these blocks have no compatibility mapping but are clearly intended for legacy support:{{citation needed|date=July 2025}}
Alphabetic Presentation Forms (1)
Line 118:
Arabic Presentation Forms (4)
# "Ornate Left Parenthesis" (U+FD3E): ﴾. A glyph variant for U+
# "Ornate Right Parenthesis" (U+FD3F): ﴿. A glyph variant for U+
# "Ligature Bismillah Ar-Rahman Ar-Raheem" (U+FDFD): ﷽. [[Bismillah ar-Rahman, ar-Raheem|Bismillah Ar-Rahman Ar-Raheem]] is a ligature for Beh (U+0628), Seen (U+0633), Meem (U+0645), Space (U+0020), Alef (U+0627), Lam (U+0644), Lam (U+0644), Heh (U+0647), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Meem (U+0645), Alef (U+0627), Noon (U+0646), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Yeh (U+064A), Meem (U+0645) i.e. {{lang|ar|بسم الله الرحمان الرحيم}} <ref>[https://www.unicode.org/charts/PDF/UFB50.pdf Unicode chart FB50-FDFF (PDF)].</ref><!-- Note: In Unicode, characters are written in "logical" sequence, i.e. from right to left in RTL languages such as Arabic. --> (Similarly, U+FDFA and U+FDFB code for two other Arabic ligatures, of 21 and 9 characters respectively.)
# "Arabic Tail Fragment" (U+FE73): ﹳ for supporting text systems without contextual glyph handling
|