Code page 932 (Microsoft Windows): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 17:53, 6 January 2024 edit HarJIT (talk \| contribs) Extended confirmed users 12,435 edits No edit summary ← Previous edit		Latest revision as of 13:38, 14 August 2025 edit undo 2a0e:1d47:9098:3800:2d3f:2be2:c623:63a5 (talk) →Double-byte character differences: quote 'because' and 'not' where they're literals
(4 intermediate revisions by 3 users not shown)
Line 1: {{Short description\|~~Japanese~~ Windows character ~~encoding~~set /for ~~Shift JIS variant.~~Japanese}} {{About\|Microsoft's Code Page 932 and IBM's Code Page 943\|IBM's Code Page 932\|Code page 932 (IBM)}} {{Redirect\|Windows-31J\|the operating system version\|Windows 3.1J}} Line 6: \| mime = Windows-31J \| alias = CP943C \| standard = [[WHATWG Encoding Standard]] (as "Shift_JIS")<ref name="encoding_rs">{{cite web \|url=https://docs.rs/encoding_rs/latest/encoding_rs/#notable-differences-from-iana-naming \|title=Notable Differences from IANA Naming \|work=Crate encoding_rs \|publisher=docs.rs \|author=Mozilla Foundation \|~~author_link~~author-link=Mozilla Foundation}}</ref> \| lang = [[Japanese language\|Japanese]] \| status = Line 40: In addition to the standard [[JIS X 0201]]:1997 and [[JIS X 0208]]:1997 characters, Windows-31J includes several JIS X 0208 extensions, namely "[[JIS X 0208#0x2D\|NEC special characters]] (Row 13), NEC selection of IBM extensions (Rows 89 to 92), and IBM extensions (Rows 115 to 119)",<ref name="iana31j" /> in addition to setting some encoding space aside for [[Private Use Areas#Private-use characters in other character sets\|end user definition]].<ref>{{cite web \| url=http://archives.miloush.net/michkap/archive/2007/05/26/2901371.html \| title=The PUA outside of Unicode \| author=Kaplan, Michael S \| work=Sorting it all out \| date=2007-05-26}}</ref> This also differs from [[Code page 932 (IBM)\|IBM-932]], which does not include the NEC extensions or NEC selection.<ref name="ibm932v943"/> The IBM extensions were designed to encode characters from the [[Japanese language in EBCDIC#Double-byte codes\|IBM Japanese DBCS-Host]] repertoire which were initially absent in JIS X 0208; the [[because sign\|'because' sign]] ∵ and [[not sign\|'not' sign]] ￢ were later added to JIS X 0208 itself in 1983, and Microsoft includes them at extension locations as well as their 1983 locations.<ref name="lundeE">{{citation\|mode=cs1 \|title=Appendix E: Vendor Character Set Standards \|work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing \|last=Lunde \|first=Ken \|author-link=Ken Lunde \|year=2009 \|edition=2nd \|publisher=[[O'Reilly Media\|O'Reilly]] \|___location=[[Sebastopol, CA]] \|isbn=978-0-596-51447-1 \|url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> The NEC extensions also encode the entirety of the IBM repertoire, but in a separate extension within the 94×94 JIS X 0208 grid (in rows 89–92, besides the characters already included in [[JIS X 0208#0x2D\|NEC row 13]]), rather than using Shift JIS codes beyond the JIS X 0208 range; Windows code page 932 includes these 388 characters in both locations.<ref name="lundeE"/> As a result, the 'because' and 'not' signs are encoded three times. Some of these representations were subsequently used for different characters by [[JIS X 0213]] and [[Shift JIS-2004]]. For example, compare row 89 in JIS X 0213 (beginning 硃, 硎, 硏…)<ref>{{cite iso-ir \|number=233 \|title=Japanese Graphic Character Set for Information Interchange, Plane 1 \|sponsor=Japanese Industrial Standards Committee \|sponsor-link=Japanese Industrial Standards Committee \|date=2004-04-13}}</ref> to row 89 as used by JIS X 0208 with IBM/NEC extensions (beginning 纊, 褜, 鍈…).<ref>{{cite web \| url=https://encoding.spec.whatwg.org/jis0208.html \| title=Index jis0208 visualization \| publisher=WHATWG \| work=Encoding Standard \|last=van Kesteren \|first=Anne \|author-link=Anne van Kesteren}}</ref> Consequently, Shift JIS-2004 is not compatible with Windows-31J. Line 51: However, 0x5C in Windows-932 is nonetheless considered a Yen sign in certain contexts.<ref name="kaplan">{{cite web \| title=When is a backslash not a backslash? \| date=2005-09-17 \| author=Kaplan, Michael S. \| url=http://archives.miloush.net/michkap/archive/2005/09/17/469941.html \| work=Sorting it all out}}</ref> For this reason, in many Japanese fonts, U+005C is displayed as a Yen symbol, which would normally be represented as U+00A5, rather than as a backslash per Unicode's suggested rendering. U+00A5 is one-way best-fit mapped onto 0x5C in Windows-932. However, code 0x5C in Windows-932 behaves as a reverse solidus (backslash) in all respects (e.g. in [[filename\|file paths]] on Windows systems) other than how it is displayed by some fonts,<ref name="kaplan" /> and Microsoft's documentation for Windows-932 displays 0x5C as a backslash.<ref name="msrefrender" /> This mapping<ref name="msmapping" /> corresponds to the encoding named "ibm-943_P15A-2003" in [[International Components for Unicode]] (ICU),<ref name="icuwindows31j" /> except for minor reordering of a few [[C0 control characters]]. [[Code page 437\|IBM-943]], like [[Code page 932 (IBM)\|IBM-932]],<ref name="ibm932v943"/> is a superset of the single-byte [[Code page 897]],<ref name="ibm943"/> which maps 0x5C to the Yen symbol (<code>¥</code>) and 0x7E to the overline (<code>‾</code>),<ref name="cp00897txt">{{cite web \| url=~~ftp~~https://~~ftp~~public.~~software~~dhe.ibm.com/software/globalization/gcoc/attachments/CP00897.txt \| title=CP00897.txt \| publisher=IBM \| archive-date=2019-01-12 \| url-status=live \| archive-url=https://www.webcitation.org/75NZsweMG?url=ftp://ftp.software.ibm.com/software/globalization/gcoc/attachments/CP00897.txt \| access-date=2017-09-24 }}</ref> this is followed by the encoding named "ibm-943_P130-1999" in ICU.<ref name="icuibm943">{{cite web \| url=http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943 \| work=International Components for Unicode: ICU Demonstration \| title=Converter Explorer: ibm-943_P130-1999}}</ref> Code page 897 (and therefore also IBM-943 and IBM-932) also adds single-byte box-drawing characters replacing certain [[C0 control characters]],<ref name="cp00897txt" /> however these may still be treated as control characters depending on the context,<ref>{{cite web \| url=http://www-01.ibm.com/software/globalization/cp/cp00897.html \| title=Code page identifiers - CP 00897 \| publisher=IBM \| work=IBM Globalization \| url-status=dead \| archive-url=https://web.archive.org/web/20160317053427/http://www-01.ibm.com/software/globalization/cp/cp00897.html \| archive-date=2016-03-17}}</ref> and are mapped to control characters in ICU.<ref name="icuibm943" /> ==Layout==