Code page 932 (Microsoft Windows): Difference between revisions

Content deleted Content added
mNo edit summary
No edit summary
Line 6:
| mime = Windows-31J
| alias = CP943C
| standard = WHATWG Encoding Standard (as "Shift_JIS")<ref name="encoding_rs">{{cite web |url=https://docs.rs/encoding_rs/latest/encoding_rs/#notable-differences-from-iana-naming |title=Notable Differences from IANA Naming |work=Crate encoding_rs |publisher=docs.rs}}</ref>
| standard = WHATWG Encoding Standard (as "Shift_JIS")
| lang = [[Japanese language|Japanese]]
| status =
Line 20:
IBM offer the same extended double-byte codes in their '''[[code page]] 943''' ('''IBM-943''' or '''CP943'''),<ref name="ibm932v943">{{cite web | url=https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.nlsgdrf/ibm-943_ibm-932.htm | title=IBM-943 and IBM-932 | publisher=IBM | work=IBM Knowledge Center}}</ref> which is a combination of the single-byte [[Code page 897]] and the double-byte '''Code page 941'''.<ref name="ibm943">{{cite web | url=http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | title=Coded character set identifiers - CCSID 943 | publisher=IBM | work=IBM Globalization | archive-url=https://web.archive.org/web/20160315110642/http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | archive-date=2016-03-15}}</ref>
 
Windows-31J is the most used non-[[UTF-8]]/Unicode Japanese encoding on the web. However, many people and software packages, including Microsoft libraries,<ref name="msdnlabels"/> declare the {{nowrap|[[Shift JIS]]}} encoding for Windows-31J data, although it includes some additional characters, and some of the existing characters are mapped to [[Unicode]] differently. This has led the WHATWG HTML standard to treat the encoding labels {{code|shift_jis}} and {{code|windows-31j}} interchangeably, and use the Windows variant for its "Shift_JIS" encoder and decoder.<ref name="encoding_rs"/><!-- Per W3C / WHATWG standards, the labels Shift_JIS and Windows-31J are treated the same; the W3C/WHATWG spec uses the Shift JIS name, but its definition actually matches Windows-31J (not JIS X 0208 Appendix 1). -->
 
==Terminology==
Line 27:
IBM's code page 943 (or "IBM-943") includes the same double byte codes as Windows code page 932.<ref name="ibm932v943" /> Microsoft's version corresponds closely to the encoding referred to as '''ibm-943_P15A-2003''' (with aliases including '''CP943C''' and '''Windows-932''')<ref name="icuwindows31j" /> in [[International Components for Unicode]] (ICU). There is also a second ICU encoding named '''ibm-943_P130-1999''',<ref name="icuibm943" /> which uses different single-byte mappings which more closely match IBM's code page definitions. (See [[#Single-byte character differences|§ Single-byte character differences]] below for details.)
 
Windows code page 932 is registered with the [[Internet Assigned Numbers Authority|IANA]] as '''Windows-31J'''.<ref name="iana31j">{{cite web | url=https://www.iana.org/assignments/character-sets/character-sets.xhtml | publisher=IANA | title=Character Sets}}</ref> The "Windows-31J" label is IANA's and not recognized by Microsoft, which has historically used "shift_jis" instead.<ref name="msdnlabels">{{cite web|url=https://msdn.microsoft.com/en-us/library/system.text.encoding.windowscodepage(v=vs.110).aspx |title=Encoding.WindowsCodePage Property - .NET Framework (current version) |work=MSDN |publisher=Microsoft}}</ref> The [[W3C]]/[[WHATWG]] encoding standard used by [[HTML5]] treats the label "'''shift_jis'''" interchangeably with "windows-31j" with the intent of being "compatible with deployed content"<ref>{{cite web | url=https://encoding.spec.whatwg.org/#names-and-labels | title=4.2. Names and labels | publisher=WHATWG | work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> and matches Windows code page 932<ref name="encoding_rs"/> (including the "formerly proprietary extensions from IBM and NEC").<ref>{{cite web | url=https://encoding.spec.whatwg.org/#index-jis0208 | title=5. Indexes (§ Index jis0208) | publisher=WHATWG | work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>
 
Windows code page 932 is also called '''MS_Kanji''',<ref name="icuwindows31j" /><ref name="python">{{cite web | url=https://docs.python.org/3.6/library/codecs.html#standard-encodings | title=7.2.3. Standard Encodings | publisher=Python Software Foundation | work=Python 3.6 Documentation | access-date=19 September 2017}}</ref> although IANA treat MS_Kanji as an alias for standard Shift JIS.<ref name="iana31j"/> [[Python (programming language)|Python]], for example, uses the label <code>MS-Kanji</code> (or <code>cp932</code>) for Windows-932 and the label <code>Shift_JIS</code> (or <code>sjis</code>) for JIS X 0208-defined Shift JIS, without recognising the <code>Windows-31J</code> label.<ref name="python" />