Content deleted Content added
No edit summary |
Guy Harris (talk | contribs) →String constants: String constants in VS; other compilers may differ. |
||
(27 intermediate revisions by 10 users not shown) | |||
Line 1:
{{Short description|Overview on Unicode implementation in Microsoft Windows}}
{{more citations needed|date=June 2011}}
[[Microsoft]] was one of the first companies to implement [[Unicode]] in their products. [[Windows NT]] was the first operating system that used "wide characters" in [[system call]]s. Using the (now obsolete) [[UCS-2]] encoding scheme at first, it was upgraded to the [[variable-width encoding]] [[UTF-16]] starting with [[Windows 2000]], allowing a representation of additional planes with surrogate pairs. However Microsoft did not support [[UTF-8]] in its API until May 2019.
Before 2019, Microsoft emphasized UTF-16 (i.e. -W API), but has since recommended to use [[UTF-8]]
A large amount of Microsoft documentation uses the word "Unicode" to refer explicitly to the UTF-16 encoding. Anything else, including UTF-8, is not "Unicode" in Microsoft's outdated language (while UTF-8 and UTF-16 are both Unicode according to
== In various Windows families ==
Line 16 ⟶ 17:
=== Windows CE ===
In (the now discontinued) [[Windows CE]], UTF-16 was used almost exclusively, with the 'A' API mostly missing.<ref>{{cite web|title=Differences Between the Windows CE and Windows NT Implementations of TAPI|url=https://msdn.microsoft.com/en-us/library/aa454022.aspx|website=MSDN|date=28 August 2006 |access-date=7 May 2018|quote=Windows CE is Unicode-based. You might have to recompile source code that was written for a Windows NT-based application.}}</ref> A limited set of ANSI API is available in Windows CE 5.0, for use on a reduced set of locales that may be selectively built onto the runtime image.<ref>{{cite web|title=Code Pages (Windows CE 5.0)|url=https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ms903783(v=msdn.10)|website=Microsoft Docs| date=14 September 2012 |access-date=7 May 2018|language=en-us}}</ref>
=== Windows 9x ===
Line 24:
== UTF-8 ==
Microsoft Windows ([[Windows XP]] and later) has a code page designated for [[UTF-8]], code page 65001<ref>{{cite web|title=Code Page Identifiers (Windows)|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx|website=msdn.microsoft.com| date=7 January 2021 |language=en}}</ref> or <code>CP_UTF8</code>. For a long time, it was impossible to set the locale code page to 65001, leaving this code page only available for
Programs that wanted to use UTF-8, in particular code intended to be portable to other operating systems, needed a workaround for this deficiency. The usual work-around was to add new functions to open files that convert UTF-8 to UTF-16 using [[MultiByteToWideChar]] and call the "wide" function instead of <code>fopen</code>.<ref>{{cite web|url=https://stackoverflow.com/questions/166503/utf-8-in-windows|title=UTF-8 in Windows|publisher=[[Stack Overflow]]|access-date=July 1, 2011}}</ref> Dozens of multi-platform libraries added wrapper functions to do this conversion on Windows (and pass UTF-8 through unchanged on others), an example is a proposed addition to [[Boost (C++ libraries)|Boost]], {{tt|Boost.Nowide}}.<ref>{{cite web|url=https://github.com/boostorg/nowide|title=Boost.Nowide|website=[[GitHub]]}}</ref> Another popular work-around was to convert the name to the [[8.3 filename]] equivalent, this is necessary if the <code>fopen</code> is inside a library. None of these workarounds are considered good, as they require changes to the code that works on non-Windows.
Line 32:
In May 2019, Microsoft added the ability for a program to set the code page to UTF-8 itself,<ref name="Microsoft-UTF-8">{{cite web|title=Use UTF-8 code pages in Windows apps|url=https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page |access-date=2020-06-06 |quote=As of Windows version 1903 (May 2019 update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page. [...] <code>CP_ACP</code> equates to <code>CP_UTF8</code> only if running on Windows version 1903 (May 2019 update) or above and the ActiveCodePage property described above is set to UTF-8. Otherwise, it honors the legacy system code page. We recommend using <code>CP_UTF8</code> explicitly. |website=learn.microsoft.com |language=en-us}}</ref><ref>{{cite web|url=https://skanthak.homepage.t-online.de/quirks.html#quirk31|title=Windows 10 1903 and later versions finally support UTF-8 with the A forms of the Win32 functions}}</ref> allowing programs written to use UTF-8 to be run by non-expert users.
{{As of|2019}}, Microsoft recommends programmers use UTF-8 (e.g. instead of any other 8-bit encoding),<ref name="Microsoft-UTF-8">{{cite web|title=Use UTF-8 code pages in Windows apps|url=https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page |access-date=2020-06-06 |quote=As of Windows version 1903 (May 2019 update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page. [...] <code>CP_ACP</code> equates to <code>CP_UTF8</code> only if running on Windows version 1903 (May 2019 update) or above and the ActiveCodePage property described above is set to UTF-8. Otherwise, it honors the legacy system code page. We recommend using <code>CP_UTF8</code> explicitly. |website=learn.microsoft.com |language=en-us}}</ref> on Windows and [[Xbox]], and may be recommending
=== String constants in Visual Studio ===
Before 2019 Microsoft's compilers
== See also ==
|