Content deleted Content added
→Single byte codes: New table format |
Importing Wikidata short description: "IBM/AIX character encoding for Korean" |
||
(14 intermediate revisions by 5 users not shown) | |||
Line 1:
{{Short description|IBM/AIX character encoding for Korean}}
{{infobox character encoding
|name=IBM code page 949
Line 18 ⟶ 19:
Giving values in [[hexadecimal]], bytes 0x00 through 0x7F are used for single byte KS X 1003 ([[ISO 646]]:KR) characters, a similar set to ASCII but with a [[won sign]] rather than a backslash. Bytes 0x80 through 0x84 are used for IBM single byte extension characters. Lead bytes 0x8F through 0xA0 are used for IBM double byte extension characters. Lead bytes 0xA1 through 0xFE are used for Wansung code ([[KS X 1001]] characters in EUC-KR form, double byte), but with some unused space opened up for user-defined use.
Although both are sometimes named "cp949", IBM-949 is different from [[Unified Hangul Code|Windows code page 949]] (IBM-1363), which is Microsoft's Unified Hangul Code, a different extension of EUC-KR. It should also not be confused with IBM's implementation of plain EUC-KR ([[Code page 970|IBM-970]]). Code page 949 in [[OS/2]] is the IBM code page; however, a third-party patch exists to change this.<ref name="borgendale949">{{cite web |url=http://www.borgendale.com/tools/tools.htm |title=OS/2 Codepage and Keyboard Display Tools |first=Ken |last=Borgendale}}</ref>
== Terminology and encoding labelling ==
Both IBM-949 and [[Unified Hangul Code]] (Windows-949) are known as "code page 949" (or "cp949") although they share only the EUC-KR subset in common. Neither has a standardised [[IANA]]-registered label to identify it. Although UHC is included in the [[WHATWG]] Encoding Standard,<ref>{{citation|url=https://encoding.spec.whatwg.org/#index-euc-kr|title=5. Indexes (§ index EUC-KR)|work=Encoding Standard|publisher=WHATWG |last=van Kesteren |first=Anne |author-link=Anne van Kesteren |quotation=This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together as Windows Codepage 949.}}</ref> with labels including "windows-949",<ref>{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> IBM-949 is not. IBM-949 therefore is not permitted in [[HTML5]].
Although the meaning of the label "ibm-949" (and conversely "windows-949" and "ms949") is unambiguous where these labels are supported, the interpretation of the encoding labels "949" and "cp949" consequently varies between implementations. For example, [[International Components for Unicode]] uses "cp949", "949", "ibm-949" and "x-IBM949" to refer to IBM-949,<ref name="icu"/> and additionally the labels "cp949c", "ibm-949c" and "x-IBM949C" to refer to an variant which uses unmodified [[ASCII]] mappings for 0x20–7E<!-- Not 0x00–7F, since it still has a FS-SUB-DEL pivot. --> (resulting in duplicate mappings for the backslash),<ref name="icuc"/> while (of the labels incorporating the code page number 949) only "ms949" and "windows-949" are assigned to UHC.<ref name="icums949">{{citation|url=http://demo.icu-project.org/icu-bin/convexp?conv=windows-949-2000|publisher=International Components for Unicode|title=windows-949-2000|work=Converter Explorer}}</ref> This is in contrast to [[Python (programming language)|Python]], which recognises both "cp949" and "949" (in addition to the more explicit "ms949" and "uhc", but not "windows-949") as labels for UHC, and does not include an IBM-949 codec.<ref>{{cite web |url=https://docs.python.org/3.7/library/codecs.html#standard-encodings |title=codecs — Codec registry and base classes § Standard Encodings |work=Python 3.7.2 documentation |publisher=Python Software Foundation}}</ref> The code page 949 used by Korean language versions of [[OS/2]] is the IBM code page; to add support for the entire Unicode set of Korean syllables, a third-party patch exists to replace it with the Microsoft code page.<ref name="borgendale949"/>
IBM-949 is a [[variable width encoding]] defined as the combination of two fixed-width [[code page]]s, the single-byte '''Code page 1088''' and the double-byte [[Code page 951]].<ref name="ccsid949">{{cite web|title=Coded character set identifiers: CCSID 949 |archive-url=https://web.archive.org/web/20141129233846/http://www-01.ibm.com/software/globalization/ccsid/ccsid949.html|archive-date=2014-11-29|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid949.html |work=IBM Globalization |publisher=[[IBM]] |url-status=dead}}</ref><ref>{{cite web|title=CCSID 1088 information document|archive-url=https://web.archive.org/web/20160326215133/http://www-01.ibm.com/software/globalization/ccsid/ccsid1088.html|archive-date=2016-03-26|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid1088.html}}</ref><ref>{{cite web|title=Code page 951 information document|archive-url=https://web.archive.org/web/20170116144609/https://www-01.ibm.com/software/globalization/cp/cp00951.html|archive-date=2017-01-16|url=https://www-01.ibm.com/software/globalization/cp/cp00951.html}}</ref>
== History ==
A version of Code page 951 (a DBCS-PC, i.e. double-byte non-[[Extended Unix Code|EUC]] non-[[EBCDIC]], code), the double-byte component for IBM-949, is defined in the September 1992 revision of IBM Corporate Specification C-H 3-3220-125, along with Code page 834 (a DBCS-Host, i.e. double-byte EBCDIC, code), which is the double byte component of [[Code page 933]].<ref name="ch3320125-1992">{{cite web |url=
{{infobox character encoding
Line 44 ⟶ 45:
== Single byte codes ==
{|{{chset-table-header1|IBM code page 949 (single byte component: 1088)<ref>{{Citation|title=Code Page CPGID 01088 (pdf)|url=
|-
|{{chset-left1|0x}}
Line 237 ⟶ 238:
|{{chset-cell1|Ext Alnum|[[KS X 1001#row 8|8-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|Ext Alnum|[[KS X 1001#row 9|9-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|Hiragana|[[KS X 1001#row
|{{chset-cell1|Katakana|[[KS X 1001#row
|{{chset-cell1|Cyrillic|[[KS X 1001#row
|{{chset-cell1||13-_|style=font-size:small;background:#DDD}}
|{{chset-cell1||14-_|style=font-size:small;background:#DDD}}
Line 272 ⟶ 273:
|{{chset-cell1|Syllable|[[KS X 1001#composed|39-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|Syllable|[[KS X 1001#composed|40-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|UDC|[[#UDC|41-_]]|style=font-size:small;background:#
|{{chset-cell1|Hanja|[[wikt:Appendix:Korean Hanja by KS X 1001 hangyol code#Row 42|42-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|Hanja|[[wikt:Appendix:Korean Hanja by KS X 1001 hangyol code#Row 43|43-_]]|style=font-size:small;background:#DFD}}
Line 331 ⟶ 332:
|{{chset-cell1|Hanja|[[wikt:Appendix:Korean Hanja by KS X 1001 hangyol code#Row 92|92-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|Hanja|[[wikt:Appendix:Korean Hanja by KS X 1001 hangyol code#Row 93|93-_]]|style=font-size:small;background:#DFD}}
|{{chset-cell1|UDC|[[#UDC|94-_]]|style=font-size:small;background:#
|{{chset-cell1|||style=background:#DDD}}
|}
Line 338 ⟶ 339:
== Double byte codes ==
=== {{anchor|UDC}}Lead bytes 0x8F–99, 0xC9, 0xFE (user
IBM-949 is designed to support a maximum of 1880 UDC (user-defined characters),<ref name="ccsid949"/> including
When mapped to Unicode, 0xC9A1–C9FE (between the syllable and hanja ranges) are mapped to the Unicode [[Private Use Areas|Private Use Area]] code points U+E000–E05D, while 0xFEA1–FEFE (between the end of the hanja range and the end of the plane) are mapped to U+E05E–E0BB. Outside the Wansung plane, 0x8FA0–9AA5 (where the second byte is in the range 0xA1–FE) are mapped to the Private Use Area code points U+E0BC–E4CA.<ref name="icu"/> The last of these ranges cuts into the start of the [[#0x9A|0x9A row]] (shown below).
Collectively these private use ranges cover the code points U+
=== {{anchor|0x9A}}Lead bytes 0x9A–9D (extended symbols and hanja) ===
According to the 1992 specification, this entire range is user-defined.<ref name="ch3320125-1992"/> As implemented in the codec contributed to ICU by IBM, however, 0x9AA1 through 0x9AA5 are the end of the user-defined range. The remainder of this range includes some non-Hangul characters included in [[Code page 933]] but not in Wansung code. 0x9AA6 through 0x9AAB contain miscellaneous technical or mathematical symbols. The remainder contains [[hanja]] additional to those included in [[KS X 1001]], although some are mapped by IBM to the Private Use Area.<ref name="ucm"/>
{|{{chset-table-
▲{{chset-table-header|IBM code page 949 (prefixed with 0x9A)<ref name="ucm"/>{{refn|Private Use Area mapped hanja are identified from code charts. The IBM document C-H 3-3220-125 1992-09 gives code charts for the code pages used as the double-byte components for [[Code page 933]] and an older version of Code page 949 without these extensions; however, the hanja in this section correspond to (and are in the same order as) the subset of table 7 for which a "PC Code" is not listed.{{refn|name=ch3320125-1992}} The Corporate Private Use Area mappings are also co-ordinated with other code pages,{{refn|name=ibmpua}} including Code page 933,{{refn|{{cite web |url=https://raw.githubusercontent.com/unicode-org/icu/main/icu4c/source/data/mappings/ibm-933_P110-1995.ucm |title=ibm-933_P110-1995.ucm |work=[[International Components for Unicode]]}}}} which can be used to obtain the "Host Code" for a given Corporate Private Use Area mapping.|name=puaid}}}}
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-cell1|U+5231 CJK Ideograph|[[wikt:刱|刱]]|fn={{efn|The mapping from IBM is U+5231 刱, but the glyph in the IBM document C-H 3-3220-125 1992-09 is closer to U+5259 [[wikt:剙|剙]] (host code 62D5).{{refn|name=ch3320125-1992}}}}}}
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|}
=== {{anchor|0x9E}}Lead bytes 0x9E–A0 (extended hanja and hangul syllables) ===
According to the 1992 specification, this entire range is user-defined.<ref name="ch3320125-1992"/> As implemented in the codec contributed to ICU by IBM, 0x9EA1 through 0x9EAC contain the remainder of the extended hanja. The rest of the range contains a few additional [[Hangul]] syllables which are not available in pre-composed form in pure [[EUC-KR]]. Unlike Unified Hangul Code, this is insufficient to support all non-partial [[Johab]] syllables absent in Wansung code.<ref name="ucm"/>
Significant amongst these are 뢔 (
{|{{chset-table-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|{{chset-
|}
|