Revision as of 17:38, 23 July 2019 edit Comp.arch (talk \| contribs) Extended confirmed users 41,478 edits m E.g. "Roman numeral", with latter lower case(?) but always former upper. Tag: 2017 wikitext editor ← Previous edit		Revision as of 19:42, 23 July 2019 edit undo Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Typographic conventions: Mention error conversion as it is a form of normalization too Next edit →
Line 44: ===Typographic conventions=== Unicode provides code points for some characters or groups of characters which are modified only for aesthetic reasons (such as [[Typographic ligature\|ligatures]], the half-width [[katakana]] characters, or the double-width Latin letters for use in Japanese texts), or to add new semantics without losing the original one (such as digits in [[subscript]] or [[superscript]] positions, or the circled digits (such as "①") inherited from some Japanese fonts). Such a sequence is considered compatible with the sequence of original (individual and unmodified) characters, for the benefit of applications where the appearance and added semantics are not relevant. However the two sequences are not declared canonically equivalent, since the distinction has some semantic value and affects the rendering of the text. ===Encoding errors=== [[UTF-8]] and [[UTF-16]] (and also some other Unicode encodings) do not allow all possible sequences of [[code unit]]s. Different software will convert invalid sequences into Unicode characters using varying rules, some of which are very lossy (ie turning all invalid sequences into the same character). This can be considered a form of normalization and can lead to the same difficulties as others. ==Normalization==

Unicode equivalence: Difference between revisions