Content deleted Content added
Line 40:
=== Rich text compatibility characters ===
Many other compatibility characters constitute what Unicode considers rich text and therefore outside the goals of Unicode and UCS. In some sense even compatibility characters discussed in the previous section — those that aid legacy software in displaying ligatures and vertical text — constitute a form of rich text, since the rich text protocols determine whether text is displayed in one way or another. However, the choice to display text with or without ligatures or vertically versus horizontally are both non-semantic rich text. They are simply style differences. This is contrast to other rich text such as italics, superscripts and subscripts, or list markers where the styling of the rich text implies certain semantics along with it.
For comparing, collating, handling and storing plain text, rich text variants are semantically redundant. For example, using a superscript character for the numeral 4 is likely indistinguishable from using the standard character for a numeral 4 and then using rich text protocols to make it superscript. Such alternate rich text characters therefore create ambiguity because they appear visually the same as their plain text counterpart characters with rich text formatting applied. These rich text compatibility characters include:
;[[Enclosed Alphanumerics]] and ideographs (markers): These are characters included primarily for list markers. They do not constitute plain text characters. Moreover, the use of other rich text protocols is more appropriate since, the set of enclosed alphanumerics or ideographs provisioned in the UCS is limited.
▲* '''[[Mathematical alphanumeric symbols|Mathematical Alphanumeric Symbols]]'''. These symbols are simply clones of the Latin and Greek alphabets and Indic-Arabic decimal digits repeated in 15 various typefaces. They are intended as an arbitrary palette for mathematical notation. However, they tend to undermine the distinction between encoding characters versus encoding visual glyphs as well as Unicode's goals of supporting only plain text characters. Such alternate styling for a mathematical symbol palette could be easily created through rich text protocols instead.
▲* '''[[space (punctuation)|Spaces and no-break spaces]] of varying widths'''. These characters are simply rich text variants of the core space (U+0020) and No-break Space (U+00A0). Other rich text protocols should be used instead such as tracking, kerning or word-spacing attributes.
▲* '''Some [[subscript and superscript]] form characters'''. Many of the subscript and superscript characters are actually semantically distinct characters from the [[International Phonetic Alphabet]] and other writing systems and do not really fall in the category of rich text. However, others simply constitute rich text presentation forms of other Greek, Latin and numeral characters. These rich text superscript and subscript characters therefore properly belong to this category of rich text compatibility characters. Most of these are in the "Superscripts and Subscripts" or the "Basic Latin" blocks.
For all of these rich text compatibility characters the display of glyphs is typically distinct from their compatibility decomposition (related) characters. However, these are considered compatibility characters and discouraged for use by the Unicode consortium because they are not plain text characters, which is what Unicode seeks to support with its UCS and associated protocols. Rich text should be handled through non-Unicode protocols such as HTML, CSS, RTF and other such protocols.
|