Content deleted Content added
No edit summary |
|||
Line 18:
'''Microsoft Windows code page 932''' (abbreviated '''MS932''',<ref>{{cite web | url=https://www.w3.org/Bugs/Public/show_bug.cgi?id=27851 | title=Bug 27851 - Add MS932 as a label of Shift_JIS | work=w3.org Bug Tracker | author=Sivonen, Henri}}</ref><ref name="icuwindows31j" /> '''Windows-932'''<ref name="icuwindows31j">{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&s=UTR22&s=IBM&s=WINDOWS&s=JAVA&s=IANA&s=MIME&s=- | title=Converter Explorer: ibm-943_P15A-2003 (alias windows-31j) | work=International Components for Unicode: ICU Demonstration}}</ref> or ambiguously '''CP932'''<ref>{{cite web|url=https://www.debian.org/doc/manuals/debian-reference/ch11.en.html|title=Chapter 11. Data conversion|work=Debian Reference|last=Aoki|first=Osamu|publisher=Debian}}</ref>), also called '''Windows-31J''' amongst other names (see [[#Terminology|§ Terminology]] below), is the [[Microsoft Windows]] [[code page]] for the [[Japanese language]], which is an extended variant of the [[Shift JIS]] Japanese [[character encoding]]. It contains standard 7-bit [[ASCII]] codes, and Japanese characters are indicated by the high bit of the first byte being set to 1. Some code points in this page require a second byte, so characters use either 8 or 16 bits for encoding.
IBM offer the same extended double-byte codes in their '''[[code page]] 943''' ('''IBM-943''' or '''CP943'''),<ref name="ibm932v943">{{cite web | url=https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.nlsgdrf/ibm-943_ibm-932.htm | title=IBM-943 and IBM-932 | publisher=IBM | work=IBM Knowledge Center}}</ref> which is a combination of the single-byte [[Code page 897]] and the double-byte '''Code page 941'''.<ref name="ibm943">{{cite web | url=http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html | title=
== Terminology ==
Line 50:
However, 0x5C in Windows-932 is nonetheless considered a Yen sign in certain contexts.<ref name="kaplan">{{cite web | title=When is a backslash not a backslash? | date=2005-09-17 | author=Kaplan, Michael S. | url=http://archives.miloush.net/michkap/archive/2005/09/17/469941.html | work=Sorting it all out}}</ref> For this reason, in many Japanese fonts, U+005C is displayed as a Yen symbol, which would normally be represented as U+00A5, rather than as a backslash per Unicode's suggested rendering. U+00A5 is one-way best-fit mapped onto 0x5C in Windows-932. However, code 0x5C in Windows-932 behaves as a reverse solidus (backslash) in all respects (e.g. in [[filename|file paths]] on Windows systems) other than how it is displayed by some fonts,<ref name="kaplan" /> and Microsoft's documentation for Windows-932 displays 0x5C as a backslash.<ref name="msrefrender" /> This mapping<ref name="msmapping" /> corresponds to the encoding named "ibm-943_P15A-2003" in [[International Components for Unicode]] (ICU),<ref name="icuwindows31j" /> except for minor reordering of a few [[C0 control characters]].
IBM-943, like [[Code page 932 (IBM)|IBM-932]],<ref name="ibm932v943"/> is a superset of the single-byte [[Code page 897]],<ref name="ibm943"/> which maps 0x5C to the Yen symbol (<code>¥</code>) and 0x7E to the overline (<code>‾</code>),<ref name="cp00897txt">{{cite web | url=ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00897.txt | title=CP00897.txt | publisher=IBM | archive-date=2019-01-12 | dead-url=no | archive-url=https://www.webcitation.org/75NZsweMG}}</ref> this is followed by the encoding named "ibm-943_P130-1999" in ICU.<ref name="icuibm943">{{cite web | url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 | work=International Components for Unicode: ICU Demonstration | title=Converter Explorer: ibm-943_P130-1999}}</ref> Code page 897 (and therefore also IBM-943 and IBM-932) also adds single-byte box-drawing characters replacing certain [[C0 control characters]],<ref name="cp00897txt" /> however these may still be treated as control characters depending on the context,<ref>{{cite web | url=http://www-01.ibm.com/software/globalization/cp/cp00897.html | title=Code page identifiers - CP 00897 | publisher=IBM | work=IBM Globalization | dead-link=yes | archive-url=https://web.archive.org/web/20160317053427/http://www-01.ibm.com/software/globalization/cp/cp00897.html | archive-date=2016-03-17}}</ref> and are mapped to control characters in ICU.<ref name="icuibm943" />
==Layout==
Line 71:
=== IBM related ===
*[https://web.archive.org/web/20160315110642/http://www-01.ibm.com/software/globalization/ccsid/ccsid943.html IBM's documentation of Code Page 943]
*[http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 ICU Code Page 943 (ibm-943_P130-1999) demonstration]
*[http://icu-project.org/repos/icu/data/trunk/charset/data/ucm/ibm-943_P130-1999.ucm ICU mapping for ibm-943_P130-1999 to Unicode]
|