It has been suggested that Bush hid the facts be merged into this article. (Discuss) Proposed since November 2013. |
This article needs additional citations for verification. (June 2011) |
Microsoft started to consistently implement Unicode in their products quite early.[clarification needed] Windows NT was the first operating system that used Unicode in system calls. Using at first UCS-2 encoding scheme, it was upgraded to UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.
In various Windows families
Windows NT based systems
Modern operating systems Windows XP and Windows Server 2003, and prior to them as Windows NT 4 and Windows 2000 are shipped with the system libraries, which supported string encoding of two types: UTF-16 (often called "Unicode" in Windows documentation) and an 8-bit encoding called the "code page" (or incorrectly referred to as ANSI code page). 16-bit functions have names suffixed with -W (from "wide"), for example, lstrlenW(). Code page oriented functions uses suffix -A, e.g., lstrlenA(), for "ANSI". This allows Windows NT OS family simultaneously run programs capable of using Unicode by using the UTF-16 api, and some older 8-bit encoding. Most of such "A"-functions are implemented as a wrapper that translates the code page to UTF-16 and calls the "W" functions.
Although the locale can be set so the "A" encodings handle some multi-byte encodings, it is not possible to set them to support UTF-8. As many libraries, including the standard C and C++ library, only allow access to files using the "A" api, it is not possible to open all Unicode-named files with them. These libraries could be fixed by making them convert UTF-8 to UTF-16, or the 'a' api improved to accept UTF-8, but Microsoft has so far done neither fix.
The IsTextUnicode
function uses an heuristic algorithm on a byte string passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like Notepad, often gives incorrect results. This gave rise to legends about the existence of "Easter eggs" like Bush hid the facts.
Windows CE
In Windows CE UTF-16 was used almost exclusively.
This section needs expansion. You can help by adding to it. (June 2011) |
Windows 9x
In 2001, Microsoft released a special supplement to Microsoft’s old Windows 9x systems. It includes a dynamic link library unicows.dll (only 240 KB) containing the 16-bit flavor (the ones with the letter W on the end) of all the basic functions of Windows API.
Various encoding schemes
Although Windows used the UTF-16LE encoding scheme internally, in NTFS file system, in executables and sometimes in text files, Unicode's byte oriented encodings UTF-8 and even UTF-7 are supported as well. An application which has to support UTF-8 or UTF-7 by the means of Windows API should call the same functions MultiByteToWideChar and WideCharToMultiByte used to support "legacy" (i.e. pre-Unicode) code pages.[1] Many applications imminently have to support UTF-8 because it is the most used of Unicode encoding schemes in various network protocols, including the Internet Protocol Suite.
- ^ "UTF-8 in Windows". Stack Overflow. Retrieved July 1, 2011.