Unicode compatibility characters: Difference between revisions

Content deleted Content added
BattyBot (talk | contribs)
m Glyph substitution and composition: fixed citation template(s) to remove page from Category:CS1 maint: Extra text & general fixes using AWB (11754)
No edit summary
Tags: Mobile edit Mobile web edit
Line 90:
The presence of these 167 semantically distinct though visually similar characters (plus the borderline 11 Hebrew and Greek letter based symbols and the 6 measurement unit symbols) among the decomposable characters complicates the topic of compatibility characters. The Unicode standard discourages the use of compatibility characters by content authors. However, in certain specialized areas, these characters are important and quite similar to other characters that have not been included among the compatibility characters. For example, in certain academic circles the use of Roman numerals as distinct from Latin letters that share the same glyphs would be no different from the use of Cuneiform numerals or ancient Greek numerals. Collapsing the Roman numeral characters to Latin letter characters eliminates a semantic distinction. A similar situation exists for phonetic alphabet characters that use subscript or superscript positioned glyphs. In the specialized circles that use phonetic alphabets, authors should be able to do so without resorting to rich text protocols. As another example the keyword 'circle' compatibility characters are often used for describing the game [[Go (game)|Go]]. However, these uses of the compatibility characters constitute exceptions where the author has a special reason to use the otherwise discouraged characters.
 
== Compatibility Blocksblocks ==
Several blocks of Unicode characters include either entirely or almost entirely all compatibility characters (U+F900–U+FFEF except for the nonchars). TheseThe compatibility blocks contain none of the semantically distinct compatibility characters with only one exception: the Rial Signrial currency symbol (﷼ U+FDFC) Soso the compatibility decomposable characters in the compatibility blocks fall unambiguously into the set of discouraged characters. Unicode recommends authors use the plain text compatibility decomposition equivalents instead and complement those characters with rich text markup. This approach is much more flexible and open-ended than using the finite set of circled or enclosed alphanumerics to give just one example.
 
Unfortunately, there are a small number of characters even within the compatibility blocks that themselves are not compatibility characters and therefore may confuse authors. The “Enclosed CJK Letters and Months” block contains a single non-compatibility character: the ‘Korean Standard Symbol’ (㉿ U+327F). ThisThat symbol and 12 other characters have been included in thesethe blocks for no knownunknown reasons. The “CJK Compatibility Ideographs” block contains these non-compatibility unified Han ideographs:
 
# (U+FA0E): 﨎
Line 108:
# (U+FA29): 﨩
 
These thirteen characters are neithernot compatibility characters, nor isand their use is not discouraged in any way. However, U+27EAF 𧺯, identicalthe same as U+FA23 﨣, is mistakenly encoded in CJK Unified Ideographs Extension B.<ref>[http://appsrv.cse.cuhk.edu.hk/~irg/irg/irg26/IRGN1218_Response_to_WG2.pdf#page=4 IRGN 1218]</ref> In any event, a normalized text should never contain both U+27EAF 𧺯 and U+FA23 﨣; these code points represent the same character, encoded twice.
 
Several other characters in these blocks have no compatibility mapping but are clearly intended for legacy support: