Unicode in Microsoft Windows: Difference between revisions

Content deleted Content added
m clean up; http->https (see this RfC) using AWB
Line 3:
 
== In various Windows families ==
 
=== Windows NT based systems ===
Modern Windows versions like [[Windows XP]] and [[Windows Server 2003]], and prior to them [[Windows NT]] (3.x, 4.0) and Windows 2000 are shipped with [[Windows API|system libraries]] which support string [[character encoding|encoding]] of two types: UTF-16 (often called "Unicode" in Windows documentation) and an 8-bit encoding called the "[[Windows code page|code page]]" (or incorrectly referred to as ''ANSI code page''). 16-bit functions have names suffixed with -W (from [[wide character|"wide"]]), for example, lstrlenW(). Code page oriented functions use the suffix -A, e.g., lstrlenA(), for "ANSI". This split was necessary because many languages, including C, do not provide a clean way to pass both 8-bit and 16-bit strings to the same API or put them in the same structure. Windows also provides the 'M' API which in some locales provided multi-byte encodings, but in most locales is the same as 'A'. Most such 'A' and 'M' functions are implemented as a [[Wrapper function|wrapper]] that translates the code page to UTF-16 and calls the 'W' function.
Line 13 ⟶ 14:
 
=== Windows 9x ===
{{main article|Microsoft Layer for Unicode}}
In 2001, Microsoft released a special supplement to Microsoft’s old [[Windows 9x]] systems. It includes a dynamic link library unicows.dll (only 240 KB) containing the 16-bit flavor (the ones with the letter W on the end) of all the basic functions of Windows API.
 
Line 21 ⟶ 22:
There are proposals to add an API to portable libraries such as [[Boost (C++ libraries)|Boost]] to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows.<ref>{{cite web|url=http://cppcms.com/files/nowide/html/|title=Boost.Nowide}}</ref>
 
Many applications imminently have to support UTF-8 because it is the most-used Unicode encoding scheme in various [[network protocol]]s, including the [[Internet Protocol Suite]]. An application which has to pass UTF-8 to or from a 'W' [[Windows API]] should call the functions [[MultiByteToWideChar]] and WideCharToMultiByte.<ref>{{cite web|url=httphttps://stackoverflow.com/questions/166503/utf-8-in-windows|title=UTF-8 in Windows|publisher=[[Stack Overflow]]|accessdate=July 1, 2011}}</ref> To get predictable handling of errors and surrogate halves it is more common for software to implement their own versions of these functions.
 
== References ==