Windows code page: Difference between revisions

Content deleted Content added
Reverted 1 edit by 51.36.221.242 (talk)
Microsoft DON'T recommend UTF-8, they still recommend UTF-16 (MANY -W functions of the Windows API, especially those introduced since Vista, have no -A form); they only recommend to use CP_UTF8 over CP_ANSI
Line 44:
 
=== UTF-8, UTF-16 ===
Microsoft adopted a Unicode encoding (first the now-obsolete [[UCS-2]], which was then Unicode's only encoding), i.e. [[UTF-16]] for all its [[operating system]]s from Windows NT onwards, but now additionally [[Unicode in Microsoft Windows|supports and recommends]] using [[UTF-8]] (aka <code>CP_UTF8</code>). UTF-16 uniquely encodes all Unicode characters in the [[Basic Multilingual Plane]] (BMP) using 16 bits but the remaining Unicode (e.g. [[emoji]]s) is encoded with a 32-bit (four byte) code{{snd}} while the rest of the industry ([[Unix-like]] systems and the web), and now Microsoft chose [[UTF-8]] (which uses one byte for the 7-bit [[ASCII]] character set, two or three bytes for other characters in the BMP, and four bytes for the remainder). Sincesince [[Windows 10 version history#Version 1803 (April 2018 Update)|Windows 10 version 1803]], Windows machines can be configured to allow UTF-8 as the "ANSI" and OEM codepage.<ref>{{cite web|url=https://srad.jp/story/17/11/14/0640253/|title=Windows 10のInsider PreviewでシステムロケールをUTF-8にするオプションが追加される|trans-title=The option to make UTF-8 the system locale added in Windows 10 Insider Preview|author=hylom|website=スラド|language=ja|date=2017-11-14|access-date=2018-05-10|archive-date=2018-05-11|archive-url=https://web.archive.org/web/20180511012606/https://srad.jp/story/17/11/14/0640253/|url-status=live}}</ref>
UTF-16 uniquely encodes all Unicode characters in the [[Basic Multilingual Plane]] (BMP) using 16 bits but the remaining Unicode (e.g. [[emoji]]s) is encoded with a 32-bit (four byte) code{{snd}} while the rest of the industry ([[Unix-like]] systems and the web), and now Microsoft chose [[UTF-8]] (which uses one byte for the 7-bit [[ASCII]] character set, two or three bytes for other characters in the BMP, and four bytes for the remainder).
 
== List ==