Revision as of 20:00, 27 February 2024 edit 153.229.122.121 (talk) In KS X 1001 (KS C 5601), rows 41 and 94 are user-defined ranges. ISO-IR-149 says this; see "1.14 User-definable Positions" on page 3. Also see what I added to the KS X 1001 article. ← Previous edit		Revision as of 20:23, 27 February 2024 edit undo 153.229.122.121 (talk) No edit summary Next edit →
Line 339: === {{anchor\|UDC}}Lead bytes 0x8F–99, 0xC9, 0xFE (user-defined ranges) === IBM-949 is designed to support a maximum of 1880 UDC (user-defined characters),<ref name="ccsid949"/> including the user-defined rows (lead bytes 0xC9 and 0xFE) of the Wansung plane, and ranges outside the Wansung plane. In this version, the lead bytes 0x8F–A0 contain a maximum of 1692 UDC, and lead bytes 0xC9 and 0xFE contain a maximum of 94 each (i.e. with trail bytes 0xA1–FE).<ref name="ch3320125-1992"/> However, when the extensions to support the entire double-byte repertoire of [[Code page 933\|IBM-933]] are implemented, they use lead bytes 0x9A–A0, resulting in a smaller maximum number of characters left for user definition.<ref name="icu"/><ref name="ucm"/> When mapped to Unicode, 0xC9A1–C9FE (between the syllable and hanja ranges) are mapped to the Unicode [[Private Use Areas\|Private Use Area]] code points U+E000–E05D, while 0xFEA1–FEFE (between the end of the hanja range and the end of the plane) are mapped to U+E05E–E0BB. Outside the Wansung plane, 0x8FA0–9AA5 (where the second byte is in the range 0xA1–FE) are mapped to the Private Use Area code points U+E0BC–E4CA.<ref name="icu"/> The last of these ranges cuts into the start of the [[#0x9A\|0x9A row]] (shown below). Collectively these private use ranges cover the code points U+~~E000..E4CA~~E000–E4CA, allowing 1227 UDC to be mapped from IBM-949 to Unicode.<ref name="ucm"/> The separate private use area range U+~~F843..F86E~~F843–F86E is used by IBM to map some characters within the extended hanja range.<ref name="ucm"/> This follows early recommendations from the Unicode Consortium that corporate characters be allocated from U+F8FF downward and user-defined characters be allocated from U+E000 upward,<ref>{{cite book \|section-url=https://www.unicode.org/versions/Unicode1.1.0/ch02.pdf \|section=2.0: Changes in Unicode 1.0 \|title=The Unicode Standard, Version 1.1 \|id=UTR #4 \|publisher=[[Unicode Consortium]] \|pages=3–4}}</ref> and is part of a larger corporate private use area scheme which is defined internally by IBM, and uses the range U+~~F83D..F8FF~~F83D–F8FF.<ref name="ibmpua">{{cite web \|archive-url=https://web.archive.org/web/20150916190822/http://www-01.ibm.com/software/globalization/cp/cp01449.html \|archive-date=2015-09-16 \|url=http://www-01.ibm.com/software/globalization/cp/cp01449.html \|url-status=dead \|title=CPGID 01449: IBM default PUA \|work=IBM Globalization: Code page identifiers \|publisher=[[IBM]] \|quotation=IBM has designated 195 positions from U+F83D to U+F8FF for use as IBM Corporate-zone and intends to use them consistently within IBM whenever there is a need to maintain the round-trip integrity of IBM characters.}}</ref><ref>{{citation\|mode=cs1 \|title=unicode.nam: Allow the Unicode characters to be specified using either the IBM or PostScript like names. \|author=IBM \|author-link=IBM \|date=1997}} (Included with {{citation\|mode=cs2 \|title=OS/2 Codepage and Keyboard Display Tools \|last=Borgendale \|first=Ken \|url=http://www.borgendale.com/tools/tools.htm}})</ref><!-- Note: although the documentation mentions 192 characters and unicode.nam lists 192 characters, U+F83D and U+F83E are used in e.g. ibm-1388_P103-2001.ucm (but absent from unicode.nam) so the up-to-date count seems to be 194 characters and possibly one vacancy. Least OR-ey approach is probably to avoid mentioning how many of the 195 positions are allocated. --> === {{anchor\|0x9A}}Lead bytes 0x9A–9D (extended symbols and hanja) ===

Code page 949 (IBM): Difference between revisions