Unicode in Microsoft Windows: Difference between revisions

Content deleted Content added
Undid revision 1253560523 by Comp.arch (talk) No CRT did not translate UTF-8 to whatever the OS needed and thus has nothing to do with this
Compilers: Remove text that replicates introduction sentence and is fairly obvious anyway, and the "work around" would work anyways.
Line 34:
{{As of|2019}}, Microsoft recommends programmers use UTF-8 (e.g. instead of any other 8-bit encoding),<ref name="Microsoft-UTF-8">{{cite web|title=Use UTF-8 code pages in Windows apps|url=https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page |access-date=2020-06-06 |quote=As of Windows version&nbsp;1903 (May&nbsp;2019 update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page. [...] <code>CP_ACP</code> equates to <code>CP_UTF8</code> only if running on Windows version&nbsp;1903 (May&nbsp;2019 update) or above and the ActiveCodePage property described above is set to UTF-8. Otherwise, it honors the legacy system code page. We recommend using <code>CP_UTF8</code> explicitly. |website=learn.microsoft.com |language=en-us}}</ref> on Windows and [[Xbox]], and may be recommending its use instead of UTF-16, even stating "UTF-8 is the universal code page for internationalization [and] UTF-16 [..] is a unique burden that Windows places on code that targets multiple platforms."<ref name="Microsoft GDK">{{Cite web |title=UTF-8 support in the Microsoft Game Development Kit (GDK) - Microsoft Game Development Kit |url=https://learn.microsoft.com/en-us/gaming/gdk/_content/gc/system/overviews/utf-8 |access-date=2023-03-05 |website=learn.microsoft.com |date=19 August 2022 |language=en-us |quote=By operating in UTF-8, you can ensure maximum compatibility [..] Windows operates natively in UTF-16 (or WCHAR), which requires code page conversions by using MultiByteToWideChar and WideCharToMultiByte. This is a unique burden that Windows places on code that targets multiple platforms. [..] The Microsoft Game Development Kit (GDK) and Windows in general are moving forward to support UTF-8 to remove this unique burden of Windows on code targeting or interchanging with multiple platforms and the web. Also, this results in fewer internationalization issues in apps and games and reduces the test matrix that's required to get it right.}}</ref> Microsoft does appear to be transitioning to UTF-8, stating it previously emphasized its alternative, and in [[Windows 11]] some system files are required to use UTF-8 and do not require a Byte Order Mark.<ref>{{Cite web|title=Customize the Windows 11 Start menu|url=https://docs.microsoft.com/en-us/windows-hardware/customize/desktop/customize-the-windows-11-start-menu|access-date=2021-06-29|website=docs.microsoft.com|language=en-us|quote=Make sure your LayoutModification.json uses UTF-8 encoding.}}</ref> Notepad can now recognize UTF-8 without the Byte Order Mark, and can be told to write UTF-8 without a Byte Order Mark.{{cn|date=November 2022}} Some other Microsoft products are using UTF-8 internally, including Visual Studio{{cn|date=November 2022}} and their [[SQL Server 2019]], with Microsoft claiming 35% speed increase from use of UTF-8, and "nearly 50% reduction in storage requirements."<ref>{{Cite web|date=2019-07-02|title=Introducing UTF-8 support for SQL Server|url=https://techcommunity.microsoft.com/t5/sql-server/introducing-utf-8-support-for-sql-server/ba-p/734928|quote=For example, changing an existing column data type from NCHAR(10) to CHAR(10) using an UTF-8 enabled collation, translates into nearly 50% reduction in storage requirements. [..] In the ASCII range, when doing intensive read/write I/O on UTF-8<!-- " " in quote, but ok to strip-->, we measured an average 35% performance improvement over UTF-16 using clustered tables with a non-clustered index on the string column, and an average 11% performance improvement over UTF-16 using a heap. |access-date=2021-08-24|website=techcommunity.microsoft.com|language=en}}</ref>
 
=== CompilersString constants ===
Before 2019 Microsoft's compilers could not produce UTF-8 string constants from UTF-8 source files. This is due to them converting all strings to the locale code page (which could not be UTF-8). At one time the only method to work around this was to turn ''off'' {{tt|UNICODE}}, and ''not'' mark the input file as being UTF-8 (i.e. do not use a [[UTF-8#Byte order mark|BOM]]).<ref>[http://utf8everywhere.org/#faq.literal UTF-8 Everywhere FAQ: How do I write UTF-8 string literal in my C++ code?]</ref> This would make the compiler think both the input and outputs were in the same single-byte locale, and leave strings unmolested. On modern systems setting the code page to UTF-8 helps.{{cn|reason=Probably true this fixes it, but need confirmation|date=September 2024}}
 
== See also ==