Unicode compatibility characters: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:04, 15 November 2024 edit Symbol & Font Hunter (talk \| contribs) 390 edits m Replaced the 3 characters “ffi” with “ﬃ” (the ligature itself) →Glyph substitution and composition Tag: Visual edit ← Previous edit		Latest revision as of 19:13, 28 July 2025 edit undo AnomieBOT (talk \| contribs) Bots 6,860,179 edits m Dating maintenance tags: {{Citation needed}}
(2 intermediate revisions by 2 users not shown)
Line 76: In addition, several scripts use glyph position such as superscripts and subscripts to differentiate semantics. In these cases subscripts and superscripts are not merely rich text, but constitute a distinct character in the writing system (130 total). * 112 characters representing abstract phonemes from phonetic alphabets such as the [[International Phonetic Alphabet]] use such positional glyphs to represent semantic differences (U+1D2C – U+1D6A, U+1D78, U+1D9B – U+1DBF, U+02B0 – U+02B8, U+02E0 – U+02E4) * 14 characters from the [[Kanbun]] block (U+3192 – U+319F) * 1 character from the [[Tifinagh]] script: Tifinagh Modifier Letter Labialization Mark (ⵯ U+2D6F) * 1 character from the [[Georgian script]]: Modifier Letter Georgian Nar (ჼ U+10FC) * masculine ([[º\|U+00BA]]) and feminine ([[ª\|U+00AA]]) ordinal indicators included in the [[Latin-1 ~~supplement~~Supplement]]{{citation needed\|date=January 2012}} block Finally, Unicode designates Roman numerals as compatibility equivalence to the Latin letters that share the same glyphs.{{Citation needed\|date=November 2015}} Line 112: These thirteen characters are not compatibility characters, and their use is not discouraged in any way. However, U+27EAF 𧺯, the same as U+FA23 﨣, is mistakenly encoded in CJK Unified Ideographs Extension B.<ref>[http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg26/IRGN1218_Response_to_WG2.pdf#page=4 IRGN 1218]</ref> In any event, a normalized text should never contain both U+27EAF 𧺯 and U+FA23 﨣; these code points represent the same character, encoded twice. Several other characters in these blocks have no compatibility mapping but are clearly intended for legacy support:{{citation needed\|date=July 2025}} Alphabetic Presentation Forms (1) Line 118: Arabic Presentation Forms (4) # "Ornate Left Parenthesis" (U+FD3E): ﴾. A glyph variant for U+~~0029~~0028 ')(' # "Ornate Right Parenthesis" (U+FD3F): ﴿. A glyph variant for U+~~0028~~0029 '()' # "Ligature Bismillah Ar-Rahman Ar-Raheem" (U+FDFD): ﷽. [[Bismillah ar-Rahman, ar-Raheem\|Bismillah Ar-Rahman Ar-Raheem]] is a ligature for Beh (U+0628), Seen (U+0633), Meem (U+0645), Space (U+0020), Alef (U+0627), Lam (U+0644), Lam (U+0644), Heh (U+0647), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Meem (U+0645), Alef (U+0627), Noon (U+0646), Space (U+0020), Alef (U+0627), Lam (U+0644), Reh (U+0631), Hah (U+062D), Yeh (U+064A), Meem (U+0645) i.e. {{lang\|ar\|بسم الله الرحمان الرحيم}} <ref>[https://www.unicode.org/charts/PDF/UFB50.pdf Unicode chart FB50-FDFF (PDF)].</ref><!-- Note: In Unicode, characters are written in "logical" sequence, i.e. from right to left in RTL languages such as Arabic. --> (Similarly, U+FDFA and U+FDFB code for two other Arabic ligatures, of 21 and 9 characters respectively.) # "Arabic Tail Fragment" (U+FE73): ﹳ for supporting text systems without contextual glyph handling