Revision as of 13:36, 28 February 2023 edit Spitzak (talk \| contribs) Extended confirmed users 10,503 edits Undid revision 1142091595 by 93.230.217.137 (talk) cut it out this is EXTREMELY relevant Tags: Undo Reverted ← Previous edit		Revision as of 12:24, 3 March 2023 edit undo 93.230.217.137 (talk) COMPLETELY IRRELEVANT Content removed: this article is about UNICODE, not about crap like Boost! Tags: Undo Reverted references removed Next edit →
Line 9: Current Windows versions and all back to [[Windows XP]] and prior [[Windows NT]] (3.x, 4.0) are shipped with [[Windows API\|system libraries]] that support string [[character encoding\|encoding]] of two types: 16-bit "Unicode" ([[UTF-16]] since [[Windows 2000]]) and a (sometimes multibyte) encoding called the "[[Windows code page\|code page]]" (or incorrectly referred to as ''[[American National Standards Institute\|ANSI]] code page''). 16-bit functions have names suffixed with 'W' (from [[wide character\|"wide"]]) such as <code>SetWindowTextW</code>. Code page oriented functions use the suffix 'A' for "ANSI" such as <code>SetWindowTextA</code> (some other conventions were used for APIs that were copied from other systems, such as <code>_wfopen/fopen</code> or <code>wcslen/strlen</code>). This split was necessary because many languages, including [[C (programming language)\|C]], did not provide a clean way to pass both 8-bit and 16-bit strings to the same function. [[Microsoft]] attempted to support Unicode "portably" by providing a "UNICODE" switch to the compiler, that switches unsuffixed "generic" calls from the 'A' to the 'W' interface and converts all string constants to "wide" UTF-16 versions.<ref>{{cite web\|title=Unicode in the Windows API\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089~~%28v=vs.85%29~~.aspx\|access-date=7 May 2018}}</ref><ref>{{cite web\|title=Conventions for Function Prototypes (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766~~(v=vs.85)~~.aspx\|website=MSDN\|access-date=7 May 2018\|language=en}}</ref> This does not actually work because it does not translate UTF-8 outside of string constants, resulting in code that attempts to open files just not compiling.{{citation needed\|date=October 2019}}▼ Most 'A' functions are implemented as [[wrapper function\|wrappers]] that translate the text using the current code page to UTF-16 and then call the corresponding 'W' functions.{{citation needed\|date=June 2020}} 'A' functions that return strings do the opposite conversion, turning characters that don't exist in the current locale into '?'. ▲[[Microsoft]] attempted to support Unicode "portably" by providing a "UNICODE" switch to the compiler, that switches unsuffixed "generic" calls from the 'A' to the 'W' interface and converts all string constants to "wide" UTF-16 versions.<ref>{{cite web\|title=Unicode in the Windows API\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089%28v=vs.85%29.aspx\|access-date=7 May 2018}}</ref><ref>{{cite web\|title=Conventions for Function Prototypes (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766(v=vs.85).aspx\|website=MSDN\|access-date=7 May 2018\|language=en}}</ref> This does not actually work because it does not translate UTF-8 outside of string constants, resulting in code that attempts to open files just not compiling.{{citation needed\|date=October 2019}} Earlier, and independent of the "UNICODE" switch, Windows also provided the Multibyte Character Sets (MBCS) API switch.<ref>{{cite web\|title=Support for Multibyte Character Sets (MBCSs)\|url=https://docs.microsoft.com/en-us/cpp/text/support-for-multibyte-character-sets-mbcss?view=vs-2019\|access-date=2020-06-15\|language=en}}</ref> This changes some functions that don't work in MBCS such as <code>strrev</code> to an MBCS-aware one such as <code>_mbsrev</code>.<ref>{{cite web\|title=Double-byte Character Sets\|url=https://docs.microsoft.com/en-us/windows/win32/intl/double-byte-character-sets\|website=MSDN\|access-date=2020-06-15\|date=2018-05-31\|quote=our applications use DBCS Windows code pages with the "A" versions of Windows functions.}}</ref><ref>[https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strrev-wcsrev-mbsrev-mbsrev-l _strrev, _wcsrev, _mbsrev, _mbsrev_l] Microsoft Docs</ref> Line 26 ⟶ 24: Microsoft Windows ([[Windows XP]] and later) has a code page designated for [[UTF-8]], code page 65001<ref>{{cite web\|title=Code Page Identifiers (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx\|website=msdn.microsoft.com\|language=en}}</ref> or <code>CP_UTF8</code>. For a long time, it was impossible to set the locale code page to 65001, leaving this code page only available for (a) explicit conversion functions such as MultiByteToWideChar and/or (b) the [[Win32 console]] command <code>chcp 65001</code> to translate stdin/out between UTF-8 and UTF-16. This meant that "narrow" functions, in particular <code>[[C file input/output#fopen\|fopen]]</code> (which opens files), couldn't be called with UTF-8 strings, and in fact there was no way to open all possible files using <code>fopen</code> no matter what the locale was set to and/or what bytes were put in the string, as none of the available locales could produce all possible UTF-16 characters. This problem also applied to all other APIs that take or return 8-bit strings, including Windows ones such as <code>SetWindowText</code>. On all modern non-Windows platforms, the file-name string passed to <code>fopen</code> is effectively UTF-8. This produced an incompatibility between other platforms and Windows. The usual work-around was to add code to convert UTF-8 to UTF-16 using [[MultiByteToWideChar]] and call the "wide" function instead of <code>fopen</code>.<ref>{{cite web\|url=https://stackoverflow.com/questions/166503/utf-8-in-windows\|title=UTF-8 in Windows\|publisher=[[Stack Overflow]]\|access-date=July 1, 2011}}</ref> Another popular work-around was to convert the name to the [[8.3 filename]] equivalent, this is necessary if the <code>fopen</code> is inside a library function that takes a string filename and thus calling another function is not possible. There were also proposals to add new APIs to portable libraries such as [[Boost (C++ libraries)\|Boost]] to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows. Such a library, Boost.Nowide,<ref>{{cite web\|url=https://github.com/boostorg/nowide\|title=Boost.Nowide\|website=[[GitHub]]}}</ref> was accepted into Boost<ref>{{cite web\|url=https://lists.boost.org/boost-announce/2017/06/0516.php\|title=Boost mailing list}}</ref> and will be part of the 1.73 release.{{Needs update\|date=March 2021\|reason=this cites a 2017 mailing list post, has it been released?}} This would allow code to be "portable", but required just as many code changes as calling the wide functions. In April 2018 (or possibly November 2017<ref>{{cite web\|title=Windows10 Insider Preview Build 17035 Supports UTF-8 as ANSI\|url=https://news.ycombinator.com/item?id=15710685\|website=Hacker News\|access-date=7 May 2018}}</ref>), with insider build 17035 (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox appeared for setting the locale code page to UTF-8.{{efn\|1=Found under control panel, "Region" entry, "Administrative" tab, "Change system locale" button.}} This allows for calling "narrow" functions, including <code>fopen</code> and <code>SetWindowTextA</code>, with UTF-8 strings. However this is a system-wide setting and a program cannot assume it is set. Line 47 ⟶ 45: == External links == * {{cite web \|url=http://msdn.microsoft.com/en-us/library/dd374081~~(v=vs.85)~~.aspx \|title=Unicode \|work=[[MSDN]] \|publisher=[[Microsoft]] \|access-date=November 10, 2016}} [[Category:Windows technology\|Unicode]]

Unicode in Microsoft Windows: Difference between revisions