Unicode in Microsoft Windows: Difference between revisions

Content deleted Content added
Tag: Reverted
Irrelevant content removed: Boost is no part/component of Windows
Tags: Reverted references removed
Line 24:
Microsoft Windows ([[Windows XP]] and later) has a code page designated for [[UTF-8]], code page 65001<ref>{{cite web|title=Code Page Identifiers (Windows)|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx|website=msdn.microsoft.com|language=en}}</ref> or <code>CP_UTF8</code>. For a long time, it was impossible to set the locale code page to 65001, leaving this code page only available for (a) explicit conversion functions such as MultiByteToWideChar and/or (b) the [[Win32 console]] command <code>chcp 65001</code> to translate stdin/out between UTF-8 and UTF-16. This meant that "narrow" functions, in particular <code>[[C file input/output#fopen|fopen]]</code> (which opens files), couldn't be called with UTF-8 strings, and in fact there was no way to open all possible files using <code>fopen</code> no matter what the locale was set to and/or what bytes were put in the string, as none of the available locales could produce all possible UTF-16 characters. This problem also applied to all other APIs that take or return 8-bit strings, including Windows ones such as <code>SetWindowText</code>.
 
On all modern non-Windows platforms, the file-name string passed to <code>fopen</code> is effectively UTF-8. This produced an incompatibility between other platforms and Windows. The usual work-around was to add code to convert UTF-8 to UTF-16 using [[MultiByteToWideChar]] and call the "wide" function instead of <code>fopen</code>.<ref>{{cite web|url=https://stackoverflow.com/questions/166503/utf-8-in-windows|title=UTF-8 in Windows|publisher=[[Stack Overflow]]|access-date=July 1, 2011}}</ref> Another popular work-around was to convert the name to the [[8.3 filename]] equivalent, this is necessary if the <code>fopen</code> is inside a library function that takes a string filename and thus calling another function is not possible. There were also proposals to add new APIs to portable libraries such as [[Boost (C++ libraries)|Boost]] to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows. Such a library, Boost.Nowide,<ref>{{cite web|url=https://github.com/boostorg/nowide|title=Boost.Nowide|website=[[GitHub]]}}</ref> was accepted into Boost<ref>{{cite web|url=https://lists.boost.org/boost-announce/2017/06/0516.php|title=Boost mailing list}}</ref> and will be part of the 1.73 release.{{Needs update|date=March 2021|reason=this cites a 2017 mailing list post, has it been released?}} This would allow code to be "portable", but required just as many code changes as calling the wide functions.
 
In April 2018 (or possibly November 2017<ref>{{cite web|title=Windows10 Insider Preview Build 17035 Supports UTF-8 as ANSI|url=https://news.ycombinator.com/item?id=15710685|website=Hacker News|access-date=7 May 2018}}</ref>), with insider build 17035 (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox appeared for setting the locale code page to UTF-8.{{efn|1=Found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.}} This allows for calling "narrow" functions, including <code>fopen</code> and <code>SetWindowTextA</code>, with UTF-8 strings. However this is a system-wide setting and a program cannot assume it is set.