Content deleted Content added
Undid revision 1142091595 by 93.230.217.137 (talk) cut it out this is EXTREMELY relevant |
COMPLETELY IRRELEVANT Content removed: this article is about UNICODE, not about crap like Boost! |
||
Line 9:
Current Windows versions and all back to [[Windows XP]] and prior [[Windows NT]] (3.x, 4.0) are shipped with [[Windows API|system libraries]] that support string [[character encoding|encoding]] of two types: 16-bit "Unicode" ([[UTF-16]] since [[Windows 2000]]) and a (sometimes multibyte) encoding called the "[[Windows code page|code page]]" (or incorrectly referred to as ''[[American National Standards Institute|ANSI]] code page''). 16-bit functions have names suffixed with 'W' (from [[wide character|"wide"]]) such as <code>SetWindowTextW</code>. Code page oriented functions use the suffix 'A' for "ANSI" such as <code>SetWindowTextA</code> (some other conventions were used for APIs that were copied from other systems, such as <code>_wfopen/fopen</code> or <code>wcslen/strlen</code>). This split was necessary because many languages, including [[C (programming language)|C]], did not provide a clean way to pass both 8-bit and 16-bit strings to the same function.
[[Microsoft]] attempted to support Unicode "portably" by providing a "UNICODE" switch to the compiler, that switches unsuffixed "generic" calls from the 'A' to the 'W' interface and converts all string constants to "wide" UTF-16 versions.<ref>{{cite web|title=Unicode in the Windows API|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089
▲[[Microsoft]] attempted to support Unicode "portably" by providing a "UNICODE" switch to the compiler, that switches unsuffixed "generic" calls from the 'A' to the 'W' interface and converts all string constants to "wide" UTF-16 versions.<ref>{{cite web|title=Unicode in the Windows API|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089%28v=vs.85%29.aspx|access-date=7 May 2018}}</ref><ref>{{cite web|title=Conventions for Function Prototypes (Windows)|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766(v=vs.85).aspx|website=MSDN|access-date=7 May 2018|language=en}}</ref> This does not actually work because it does not translate UTF-8 outside of string constants, resulting in code that attempts to open files just not compiling.{{citation needed|date=October 2019}}
Earlier, and independent of the "UNICODE" switch, Windows also provided the Multibyte Character Sets (MBCS) API switch.<ref>{{cite web|title=Support for Multibyte Character Sets (MBCSs)|url=https://docs.microsoft.com/en-us/cpp/text/support-for-multibyte-character-sets-mbcss?view=vs-2019|access-date=2020-06-15|language=en}}</ref> This changes some functions that don't work in MBCS such as <code>strrev</code> to an MBCS-aware one such as <code>_mbsrev</code>.<ref>{{cite web|title=Double-byte Character Sets|url=https://docs.microsoft.com/en-us/windows/win32/intl/double-byte-character-sets|website=MSDN|access-date=2020-06-15|date=2018-05-31|quote=our applications use DBCS Windows code pages with the "A" versions of Windows functions.}}</ref><ref>[https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strrev-wcsrev-mbsrev-mbsrev-l _strrev, _wcsrev, _mbsrev, _mbsrev_l] Microsoft Docs</ref>
Line 26 ⟶ 24:
Microsoft Windows ([[Windows XP]] and later) has a code page designated for [[UTF-8]], code page 65001<ref>{{cite web|title=Code Page Identifiers (Windows)|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx|website=msdn.microsoft.com|language=en}}</ref> or <code>CP_UTF8</code>. For a long time, it was impossible to set the locale code page to 65001, leaving this code page only available for (a) explicit conversion functions such as MultiByteToWideChar and/or (b) the [[Win32 console]] command <code>chcp 65001</code> to translate stdin/out between UTF-8 and UTF-16. This meant that "narrow" functions, in particular <code>[[C file input/output#fopen|fopen]]</code> (which opens files), couldn't be called with UTF-8 strings, and in fact there was no way to open all possible files using <code>fopen</code> no matter what the locale was set to and/or what bytes were put in the string, as none of the available locales could produce all possible UTF-16 characters. This problem also applied to all other APIs that take or return 8-bit strings, including Windows ones such as <code>SetWindowText</code>.
On all modern non-Windows platforms, the file-name string passed to <code>fopen</code> is effectively UTF-8. This produced an incompatibility between other platforms and Windows. The usual work-around was to add code to convert UTF-8 to UTF-16 using [[MultiByteToWideChar]] and call the "wide" function instead of <code>fopen</code>.<ref>{{cite web|url=https://stackoverflow.com/questions/166503/utf-8-in-windows|title=UTF-8 in Windows|publisher=[[Stack Overflow]]|access-date=July 1, 2011}}</ref> Another popular work-around was to convert the name to the [[8.3 filename]] equivalent, this is necessary if the <code>fopen</code> is inside a library function that takes a string filename and thus calling another function is not possible
In April 2018 (or possibly November 2017<ref>{{cite web|title=Windows10 Insider Preview Build 17035 Supports UTF-8 as ANSI|url=https://news.ycombinator.com/item?id=15710685|website=Hacker News|access-date=7 May 2018}}</ref>), with insider build 17035 (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox appeared for setting the locale code page to UTF-8.{{efn|1=Found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.}} This allows for calling "narrow" functions, including <code>fopen</code> and <code>SetWindowTextA</code>, with UTF-8 strings. However this is a system-wide setting and a program cannot assume it is set.
Line 47 ⟶ 45:
== External links ==
* {{cite web |url=http://msdn.microsoft.com/en-us/library/dd374081
[[Category:Windows technology|Unicode]]
|