Microsoft Windows code page 932 (Windows-932 or ambiguously CP932), known by IBM as code page 943 (CP943)[1] and known by the IANA as Windows-31J,[2] also called MS-Kanji,[3] is Microsoft's extended variant of Shift JIS. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding. It is a combination of Code page 897 and Code page 941.
The "Windows-31J" name is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead. In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though ANSI was not involved in its definition.
Differences from standard Shift JIS
Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid mojibake. In addition to the standard JIS X 0201:1997 and JIS X 0208:1997 characters, it includes "NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)"[2] as JIS X 0208 extensions. Such "formerly proprietary extensions from IBM and NEC", while not part of the JIS standards, are included in the W3C/WHATWG encoding standard used by HTML5.[4]
Some of these rows were subsequently used by JIS X 0213. For example, compare row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)[5] to row 89 as used by IBM/NEC extensions (beginning 纊, 褜, 鍈…).[6]
Windows-31J includes standard 7-bit ASCII codes for single-byte sequences with the high bit set to 0. Hence, codes 0x5C and 0x7E are mapped to U+005C REVERSE SOLIDUS (\
) and U+007E TILDE (~
) respectively, as they are in ASCII (ISO-646-US).[7] This is often a source of confusion because in many Japanese fonts, code 0x5C is displayed as a Yen symbol, which would normally be represented as U+00A5 YEN SIGN (¥
) in Unicode. This stems from the fact that 0x5C is mapped to U+00A5 in ISO-646-JP and consequently JIS X 0201, of which standard Shift JIS is an extension. However, code 0x5C in Windows-31J behaves as a reverse solidus (backslash) in all respects (e.g. in file paths on Windows systems) other than how it is displayed by some fonts.
See also
References
- ^ "Code Page 943". IBM.
- ^ a b "Character Sets". IANA.
- ^ "7.2.3. Standard Encodings". Python 3.6 Documentation. Python Software Foundation. Retrieved 19 September 2017.
- ^ "Index jis0208". Encoding Standard. WHATWG.
- ^ "233: Japanese Graphic Character Set for Information Interchange, Plane 1" (PDF). IPSJ.
- ^ "Index jis0208 visualization". Encoding Standard. WHATWG.
- ^ "CP932.TXT". Unicode Consortium.