Revision as of 09:23, 27 September 2015 edit JulianRDWinter (talk \| contribs) 87 edits m Small edits to improve grammar and flow of English - no change of meaning. Tag: Visual edit ← Previous edit		Revision as of 21:04, 19 February 2016 edit undo Gpvos (talk \| contribs) Extended confirmed users 4,719 edits copyedit; not valid for other modern operating systems such as Linux Next edit →
Line 1: {{refimprove\|date=June 2011}} Microsoft started to consistently implement [[Unicode]] in their products quite early.{{clarify\|date=July 2012}} [[Windows NT]] was the first operating system that used "wide characters" in [[system call]]s. Using ~~at first~~the [[UCS-2]] encoding scheme at first, it was upgraded to [[UTF-16]] starting with [[Windows 2000]], allowing a representation of additional planes with surrogate pairs. == In various Windows families == === Windows NT based systems === Modern ~~operating~~Windows ~~systems~~versions like [[Windows XP]] and [[Windows Server 2003]], and prior to them [[Windows NT]] (3.x, 4.0) and Windows 2000 are shipped with [[Windows API\|system libraries]] which support string [[character encoding\|encoding]] of two types: UTF-16 (often called "Unicode" in Windows documentation) and an 8-bit encoding called the "[[Windows code page\|code page]]" (or incorrectly referred to as ''ANSI code page''). 16-bit functions have names suffixed with -W (from [[wide character\|"wide"]]), for example, lstrlenW(). Code page oriented functions ~~uses~~use the suffix -A, e.g., lstrlenA(), for "ANSI". This split was necessary because many languages, including C, do not provide a clean way to pass both 8-bit and 16-bit strings to the same ~~api~~API or put them in the same structure. Windows also provides the 'M' API which in some locales provided multi-byte encodings, but in most locales is the same as 'aA'. Most of such "'A"' and "'M"-' functions are implemented as a [[Wrapper function\|wrapper]] that translates the code page to UTF-16 and calls the "'W"' ~~functions~~function. The <code>IsTextUnicode</code> function uses a [[heuristic algorithm]] on a [[byte string]] passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like [[Notepad (software)\|Notepad]], often gives incorrect results. This gave rise to legends about the existence of [[Easter egg (computing)\|"Easter eggs"]] like [[Bush hid the facts]]. === Windows CE === In [[Windows CE]] UTF-16 was used almost exclusively, with the "'A"' ~~api~~API mostly missing. {{expand section\|date=June 2011}} Line 17: == UTF-8 == Although the locale can be set so the "'M"' encodings handle ''some'' multi-byte encodings, it is not possible to set them to support [[UTF-8]] (attempts to use the locale id, [[code page 65001]], passed to MultiByteToWideChar for UTF-8 are ignored). As many libraries, including the standard C and C++ library, only allow access to files using the "'M"' ~~api~~API, it is not possible to open all Unicode-named files with them. Thus Unicode is not supported by Windows in software using a portable API. There are proposals to add ~~api~~an API to portable libraries such as [[Boost (C++ libraries)\|Boost]] to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows.<ref>{{cite web\|url=http://cppcms.com/files/nowide/html/\|title=Boost.Nowide}}</ref> Many applications imminently have to support UTF-8 because it is the most -used of Unicode encoding ~~schemes~~scheme in various [[network protocol]]s, including the [[Internet Protocol Suite]]. An application which has to pass UTF-8 to or from a ~~"w"~~'W' [[Windows API]] should call the functions [[MultiByteToWideChar]] and WideCharToMultiByte.<ref>{{cite web \|url=http://stackoverflow.com/questions/166503/utf-8-in-windows \|title=UTF-8 in Windows \|publisher=[[Stack Overflow]] \|accessdate=July 1, 2011}}</ref> To get predictable handling of errors and surrogate halves it is more common for software to implement their own versions of these functions. == References ==

Unicode in Microsoft Windows: Difference between revisions