Code page 949 (IBM): Difference between revisions

Content deleted Content added
No edit summary
Line 18:
Giving values in [[hexadecimal]], bytes 0x00 through 0x7F are used for single byte KS X 1003 ([[ISO 646]]:KR) characters, a similar set to ASCII but with a [[won sign]] rather than a backslash. Bytes 0x80 through 0x84 are used for IBM single byte extension characters. Lead bytes 0x8F through 0xA0 are used for IBM double byte extension characters. Lead bytes 0xA1 through 0xFE are used for Wansung code ([[KS X 1001]] characters in EUC-KR form, double byte), but with some unused space opened up for user-defined use.
 
Although both are sometimes named "cp949", IBM-949 is different from [[Unified Hangul Code|Windows code page 949]] (IBM-1363), which is Microsoft's Unified Hangul Code, a different extension of EUC-KR. It should also not be confused with IBM's implementation of plain EUC-KR ([[Code page 970|IBM-970]]). Code page 949 in [[OS/2]] is the IBM code page; however, a patch exists to change this.<ref name="borgendale949">{{cite web |url=http://www.borgendale.com/tools/tools.htm |title=OS/2 Codepage and Keyboard Display Tools |first=Ken |last=Borgendale}}</ref>
 
== Terminology and encoding labelling ==
Both IBM-949 and [[Unified Hangul Code]] (Windows-949) are known as "code page 949" (or "cp949") although they share only the EUC-KR subset in common. Neither has a standardised [[IANA]]-registered label to identify it. Although UHC is included in the [[WHATWG]] Encoding Standard,<ref>{{citation|url=https://encoding.spec.whatwg.org/#index-euc-kr|title=5. Indexes (§ index EUC-KR)|work=Encoding Standard|publisher=WHATWG |last=van Kesteren |first=Anne |author-link=Anne van Kesteren |quotation=This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together as Windows Codepage 949.}}</ref> with labels including "windows-949",<ref>{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> IBM-949 is not. IBM-949 therefore is not permitted in [[HTML5]].
 
Although the meaning of the label "ibm-949" (and conversely "windows-949" and "ms949") is unambiguous where these labels are supported, the interpretation of the encoding labels "949" and "cp949" consequently varies between implementations. For example, [[International Components for Unicode]] uses "cp949", "949", "ibm-949" and "x-IBM949" to refer to IBM-949,<ref name="icu"/> and additionally the labels "cp949c", "ibm-949c" and "x-IBM949C" to refer to an variant which uses unmodified [[ASCII]] mappings for 0x20–7E<!-- Not 0x00–7F, since it still has a FS-SUB-DEL pivot. --> (resulting in duplicate mappings for the backslash),<ref name="icuc"/> while (of the labels incorporating the code page number 949) only "ms949" and "windows-949" are assigned to UHC.<ref name="icums949">{{citation|url=http://demo.icu-project.org/icu-bin/convexp?conv=windows-949-2000|publisher=International Components for Unicode|title=windows-949-2000|work=Converter Explorer}}</ref> This is in contrast to [[Python (programming language)|Python]], which recognises both "cp949" and "949" (in addition to the more explicit "ms949" and "uhc", but not "windows-949") as labels for UHC, and does not include an IBM-949 codec.<ref>{{cite web |url=https://docs.python.org/3.7/library/codecs.html#standard-encodings |title=codecs — Codec registry and base classes § Standard Encodings |work=Python 3.7.2 documentation |publisher=Python Software Foundation}}</ref> The code page 949 used by Korean language versions of [[OS/2]] is the IBM code page; to add support for the entire Unicode set of Korean syllables, a third-party patch exists to replace it with the Microsoft code page.<ref name="borgendale949"/>
 
IBM-949 is a [[variable width encoding]] defined as the combination of two fixed-width [[code page]]s, the single-byte '''Code page 1088''' and the double-byte [[Code page 951]].<ref name="ccsid949">{{cite web|title=Coded character set identifiers: CCSID 949 |archive-url=https://web.archive.org/web/20141129233846/http://www-01.ibm.com/software/globalization/ccsid/ccsid949.html|archive-date=2014-11-29|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid949.html |work=IBM Globalization |publisher=[[IBM]] |url-status=dead}}</ref><ref>{{cite web|title=CCSID 1088 information document|archive-url=https://web.archive.org/web/20160326215133/http://www-01.ibm.com/software/globalization/ccsid/ccsid1088.html|archive-date=2016-03-26|url=http://www-01.ibm.com/software/globalization/ccsid/ccsid1088.html}}</ref><ref>{{cite web|title=Code page 951 information document|archive-url=https://web.archive.org/web/20170116144609/https://www-01.ibm.com/software/globalization/cp/cp00951.html|archive-date=2017-01-16|url=https://www-01.ibm.com/software/globalization/cp/cp00951.html}}</ref>