Code page 932 (Microsoft Windows)

This is an old revision of this page, as edited by HarJIT (talk | contribs) at 18:46, 20 September 2017 (Differences from standard Shift JIS). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Microsoft Windows code page 932 (Windows-932 or ambiguously CP932), known by IBM as code page 943 (CP943)[1] and known by the IANA as Windows-31J,[2] also called MS-Kanji,[3] is Microsoft's extension of Shift JIS. It contains standard 7-bit ASCII codes, and Japanese characters are indicated by the high bit set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding. It is a combination of Code page 897 and Code page 941.

The "Windows-31J" name is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead. In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though ANSI was not involved in its definition.

Differences from standard Shift JIS

Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid mojibake. In addition to the standard JIS X 0201:1997 and JIS X 0208:1997 characters, it includes NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). Such "formerly proprietary extensions from IBM and NEC", while not part of the JIS standards, were included in the W3C/WHATWG encoding standard used by HTML5.[4]

Windows-31J includes standard 7-bit ASCII codes for single-byte sequences with the high bit set to 0. Hence, codes 0x5C and 0x7E are mapped to U+005C REVERSE SOLIDUS (\) and U+007E TILDE (~), as they are in ASCII (ISO-646-US).[5] This is often a source of confusion because in many Japanese fonts, this code is displayed as a Yen symbol, which would normally be represented as U+00A5 YEN SIGN (¥) in Unicode. This stems from the fact that 0x5C is mapped to U+00A5 in ISO-646-JP and consequently JIS X 0201, of which standard Shift JIS is an extension. However, code 0x5C in Windows-31J behaves as a reverse solidus (backslash) in all respects (e.g. in file paths on Windows systems) other than how it is displayed by some fonts.

See also

References

  1. ^ "Code Page 943". IBM.
  2. ^ "Character Sets". IANA.
  3. ^ "7.2.3. Standard Encodings". Python 3.6 Documentation. Python Software Foundation. Retrieved 19 September 2017.
  4. ^ "5. Indexes". Encoding Standard. WHATWG.
  5. ^ "CP932.TXT". Unicode Consortium.