Revision as of 14:58, 7 May 2018 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,960 edits m →Windows CE ← Previous edit		Revision as of 18:18, 7 May 2018 edit undo Spitzak (talk \| contribs) Extended confirmed users 10,503 edits →Windows NT based systems: This seems really questionalbe, removed non-existent "lstrlenW" (the l indicates wide!). Not sure of name but "fopenW" would be a better example Next edit →
Line 5: === Windows NT based systems === {{issues}} Modern Windows versions like [[Windows XP]] and [[Windows Server 2003]], and prior to them [[Windows NT]] (3.x, 4.0) and Windows 2000 are shipped with [[Windows API\|system libraries]] which support string [[character encoding\|encoding]] of two types: UTF-16 (often called "Unicode" in Windows documentation) and an local (sometimes multibyte) encoding called the "[[Windows code page\|code page]]" (or incorrectly referred to as ''ANSI code page''). 16-bit functions have names suffixed with -W (from [[wide character\|"wide"]]), for example, lstrlenW(). Code page oriented functions use the suffix -A, e.g., lstrlenA(), for "ANSI". This split was necessary because many languages, including C, did not provide a clean way to pass both 8-bit and 16-bit strings to the same function. For the C/C++ langauges however, Windows use [[C preprocessor]] macros to define a unsuffixed "generic" version that switches between ‘A' and 'W' depending on a <code>UNICODE</code> macro.<ref>{{cite web\|title=Unicode in the Windows API\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089%28v=vs.85%29.aspx\|accessdate=7 May 2018}}</ref><ref>{{cite web\|title=Conventions for Function Prototypes (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766(v=vs.85).aspx\|website=MSDN\|accessdate=7 May 2018\|language=en}}</ref> Most such 'A' functions are implemented as a [[Wrapper function\|wrapper]] that translates the code page to UTF-16 and calls the 'W' function.▼ ▲Modern Windows versions like [[Windows XP]] and [[Windows Server 2003]], and prior to them [[Windows NT]] (3.x, 4.0) and Windows 2000 are shipped with [[Windows API\|system libraries]] which support string [[character encoding\|encoding]] of two types: UTF-16 (often called "Unicode" in Windows documentation) and an local (sometimes multibyte) encoding called the "[[Windows code page\|code page]]" (or incorrectly referred to as ''ANSI code page''). 16-bit functions have names suffixed with -W (from [[wide character\|"wide"]]~~), for example, lstrlenW(~~). Code page oriented functions use the suffix -A~~, e.g., lstrlenA(),~~ for "ANSI". This split was necessary because many languages, including C, did not provide a clean way to pass both 8-bit and 16-bit strings to the same function. For the C/C++ langauges however, Windows use [[C preprocessor]] macros to define a unsuffixed "generic" version that switches between ‘A' and 'W' depending on a <code>UNICODE</code> macro.<ref>{{cite web\|title=Unicode in the Windows API\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089%28v=vs.85%29.aspx\|accessdate=7 May 2018}}</ref><ref>{{cite web\|title=Conventions for Function Prototypes (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766(v=vs.85).aspx\|website=MSDN\|accessdate=7 May 2018\|language=en}}</ref> Most such 'A' functions are implemented as a [[Wrapper function\|wrapper]] that translates the code page to UTF-16 and calls the 'W' function. Independent of the "UNICODE" switch, Windows also provides the "MBCS" API switch.<ref>{{cite web\|title=Support for Multibyte Character Sets (MBCSs)\|url=https://msdn.microsoft.com/en-us/library/5z097dxa.aspx\|language=en}}</ref> This switch turns on some C functions prefixed with<code>_mbs</code>, and selects the 'A' functions for the current locale.<ref>{{cite web\|title=Double-byte Character Sets\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317794(v=vs.85).aspx\|website=MSDN\|accessdate=7 May 2018\|quote=our applications use DBCS Windows code pages with the "A" versions of Windows functions.}}</ref>▼ Microsoft attempted to support Unicode "portably" by providing a "UNICODE" switch to the compiler, that switches unsiffixed "generic" calls from the 'A' to the 'W' interface and converts all string constants to "wide" UTF-16 versions.<ref>{{cite web\|title=Unicode in the Windows API\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd374089%28v=vs.85%29.aspx\|accessdate=7 May 2018}}</ref><ref>{{cite web\|title=Conventions for Function Prototypes (Windows)\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317766(v=vs.85).aspx\|website=MSDN\|accessdate=7 May 2018\|language=en}}</ref> ▲~~Independent~~Earlier, and independent of the "UNICODE" switch, Windows also provides the "MBCS" API switch.<ref>{{cite web\|title=Support for Multibyte Character Sets (MBCSs)\|url=https://msdn.microsoft.com/en-us/library/5z097dxa.aspx\|language=en}}</ref> This switch turns on some C functions prefixed with<code>_mbs</code>, and selects the 'A' functions for the current locale.<ref>{{cite web\|title=Double-byte Character Sets\|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317794(v=vs.85).aspx\|website=MSDN\|accessdate=7 May 2018\|quote=our applications use DBCS Windows code pages with the "A" versions of Windows functions.}}</ref> The <code>IsTextUnicode</code> function uses a [[heuristic algorithm]] on a [[byte string]] passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like [[Microsoft Notepad\|Notepad]], often gives incorrect results. This gave rise to legends about the existence of [[Easter egg (computing)\|"Easter eggs"]] like [[Bush hid the facts]].<ref>{{cite web\|url=http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx\|title=Some files come up strange in Notepad - The Old New Thing\|date=March 24, 2007\|first=Raymond\|last=Chen\|website=blogs.msdn.com}}</ref>

Unicode in Microsoft Windows: Difference between revisions