Code page 932 (Microsoft Windows): Difference between revisions

Content deleted Content added
No edit summary
(edited with ProveIt)
Line 1:
{{hatnote|This is Microsoft's Code Page 932 and IBM's Code Page 943. For IBM's Code Page 932, see [[Code page 932]].}}
 
'''Microsoft Windows code page 932''' ('''Windows-932''' or [[Code page 932|ambiguously]] '''CP932'''), known by IBM as '''[[code page]] 943''' ('''CP943''')<ref>[{{cite web | url=http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html IBM| title=Code Page 943] | publisher=IBM}}</ref> and known by the [[Internet Assigned Numbers Authority|IANA]] as '''Windows-31J''',<ref>[{{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets]}}</ref> also called '''MS-Kanji''',<ref>{{cite web | url=https://docs.python.org/3.6/library/codecs.html#standard-encodings | title=7.2.3. Standard Encodings | publisher=Python Software Foundation | work=Python 3.6 Documentation | accessdate=19 September 2017}}</ref> is Microsoft's extension of [[Shift JIS]]. It contains standard 7-bit [[ASCII]] codes, and Japanese characters are indicated by the high bit set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding. It is a combination of [[Code page 897]] and [[Code page 941]].
 
Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid [[mojibake]]. The "Windows-31J" name, however, is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead. In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though [[ANSI]] was not involved in its definition.
In addition to the standard [[JIS X 0201]]:1997 and [[JIS X 0208]]:1997 characters, it includes NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). It is a combination of [[Code page 897]] and [[Code page 941]].
 
== Differences from standard Shift JIS ==
Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid [[mojibake]]. The "Windows-31J" name, however, is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead. In Japanese editions of Windows, this code page is referred to as "ANSI", since it is the operating system's default 8-bit encoding, even though [[ANSI]] was not involved in its definition.
 
Windows-31J is often mistaken for standard Shift JIS: while similar, the distinction is significant for computer programmers wishing to avoid [[mojibake]]. In addition to the standard [[JIS X 0201]]:1997 and [[JIS X 0208]]:1997 characters, it includes NEC special characters (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119). Such "formerly proprietary extensions from IBM and NEC", while not part of the JIS standards, were included in the [[W3C]]/[[WHATWG]] encoding standard used by [[HTML5]].<ref>{{cite web | url=https://encoding.spec.whatwg.org/#indexes | title=5. Indexes | publisher=WHATWG | work=Encoding Standard}}</ref>
Code page 943 contains standard 7-bit [[ASCII]] codes, and Japanese characters are indicated by the high bit set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.
 
NoticeWindows-31J thatincludes instandard 7-bit [[ASCII]] codes for single-byte sequences with the <code>CP932.TXT</code>high mappingbit tableset linkedto below0. Hence, codecodes 0x5C is mapped to U+005C REVERSE SOLIDUS (<code>\</code>) and 0x007E TILDE (<code>~</code>), as itthey isare in ASCII ([[ISO 646|ISO-646]]-US).<ref>{{cite web | url=http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT | title=CP932.TXT | publisher=Unicode Consortium}}</ref> This is often a source of confusion because in many Japanese fonts, this code is displayed as a [[JPY|Yen]] symbol, which would normally be represented as U+00A5 YEN SIGN (<code>¥</code>) in Unicode. This stems from the fact that 0x5C is mapped to U+00A5 in [[Code page 895|ISO-646-JP]] and consequently [[JIS X 0201]]. However, onof Windowswhich systemsstandard [[Shift JIS]] is an extension. However, code 0x5C in code page 943Windows-31J behaves as a reverse solidus (backslash) in all respects (e.g. in [[filename|file paths]] on Windows systems) other than how it is displayed by some fonts.
 
==See also==