Content deleted Content added
Add infobox Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit |
m Make definition more readable |
||
Line 7:
}}
'''UTF-EBCDIC''' is a [[character encoding]] capable of encoding all 1,112,064 valid character [[code point]]s in [[Unicode]] using
To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points U+0080 through U+009F (the [[C1 control code]]s) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses 101XXXXX instead of 10XXXXXX as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above U+03FF are larger than the UTF-8 encoding.
|