Windows code page: Difference between revisions

Content deleted Content added
ANSI code page{{anchor|ANSI}}: {{subst:anchor|ANSI}}
Line 41:
Initially, computer systems and system programming languages did not make a distinction between [[character (computing)|character]]s and [[byte]]s: for the [[List of writing systems#Segmental script|segmental scripts]] used in most of Africa, the Americas, southern and south-east Asia, the Middle East and Europe, a character needs just one byte, but two or more bytes are needed for the [[ideographic]] sets used in the rest of the world. This led to much confusion subsequently. Microsoft software and systems prior to the [[Windows NT]] line are examples of this, because they use the OEM and ANSI code pages that do not make the distinction.
 
Since the late 1990s, software and systems have adopted [[Unicode]] as their preferred storage format; this trend has been improved by the widespread adoption of [[XML]], which providesdefault ato more[[UTF-8]] adequatebut also provides a mechanism for labelling the encoding used.<ref>{{cite web | url = http://www.w3.org/TR/xml11/#charencoding | title = Extensible Markup Language (XML) 1.1 (Second Edition): Character encodings | publisher = [[W3C]] | date = 29 September 2006 | access-date = 5 October 2020 | archive-date = 19 April 2021 | archive-url = https://web.archive.org/web/20210419133700/https://www.w3.org/TR/xml11/#charencoding | url-status = live }}</ref> RecentAll current Microsoft products and [[application program interfaces]] use Unicode internally,{{cn|date=October 2020}} but manysome applications and APIs continue to use the default encoding of the computer's 'locale' when reading and writing text data to files or standard output.{{cn|date=October 2020}} Therefore, files may still be encountered that are legible and intelligible in one part of the world but unintelligible [[mojibake]] in another.
 
=== UTF-8, UTF-16 ===
Microsoft decidedadopted toa adoptUnicode encoding (first the 16now-bitobsolete (two[[UCS-byte2]], which was then Unicode's only encoding), i.e. [[UTF-16]] system for all its [[operating system]]s from Windows NT onwards., but Thisnow methodadditionally encodes[[Unicode in Microsoft Windows|supports and recommends]] using [[UTF-8]] (aka <code>CP_UTF8</code>). UTF-16 uniquely encodes all Unicode characters in the [[Basic Multilingual Plane]] and(BMP) using 16 bits but the remaining Unicode (e.g. [[emoji]]s) is encoded with a 32-bit (four byte) code for others{{snd}} butwhile the rest of the industry ([[Unix-like]] systems and the web), and now Microsoft chose [[UTF-8]] (which uses one byte for the 7-bit [[ASCII]] character set, two or three bytes for other characters in the BMP, and four bytes for the remainder). Since [[Windows 10 version history#Version 1803 (April 2018 Update)|Windows 10 version 1803]], Windows machines can be configured to allow UTF-8 as the "ANSI" and OEM codepage.<ref>{{cite web|url=https://srad.jp/story/17/11/14/0640253/|title=Windows 10のInsider PreviewでシステムロケールをUTF-8にするオプションが追加される|trans-title=The option to make UTF-8 the system locale added in Windows 10 Insider Preview|author=hylom|website=スラド|language=ja|date=2017-11-14|access-date=2018-05-10|archive-date=2018-05-11|archive-url=https://web.archive.org/web/20180511012606/https://srad.jp/story/17/11/14/0640253/|url-status=live}}</ref>
 
== List ==