Revision as of 21:06, 17 March 2024 edit SamB (talk \| contribs) Extended confirmed users 1,349 edits Add infobox Tags: Visual edit Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 20:36, 5 May 2024 edit undo Syk0saje (talk \| contribs) 35 edits m Make definition more readable Next edit →
Line 7: }} '''UTF-EBCDIC''' is a [[character encoding]] capable of encoding all 1,112,064 valid character [[code point]]s in [[Unicode]] using ~~one~~1 to ~~five~~5 ~~one-~~[[byte]] ~~(8-bit) code units~~s (in contrast to a maximum of ~~four~~4 for [[UTF-8]]).<ref>{{Cite web\|title=UTR #16: UTF-EBCDIC\|url=https://www.unicode.org/reports/tr16/tr16-8.html\|quote=You need to search at most five bytes (seven bytes, if the full range of 31 bits of ISO/IEC 10646 is considered) backwards\|access-date=2021-02-23\|website=www.unicode.org}}</ref> It is meant to be [[EBCDIC]]-friendly, so that legacy EBCDIC applications on [[Mainframe computer\|mainframes]] may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to [[UTF-8]]'s advantages for existing [[ASCII]]-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16. To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points U+0080 through U+009F (the [[C1 control code]]s) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses 101XXXXX instead of 10XXXXXX as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above U+03FF are larger than the UTF-8 encoding.

UTF-EBCDIC: Difference between revisions