Content deleted Content added
→UTF-8: No, that DOES NOT WORK for UTF-8!!!!! Please read the previous sentence. |
Artoria2e5 (talk | contribs) The 'M’ API set is not a thing, so rewrite according to MBCS docs (It's not Unicode, but it still explains the old UTF-8 rejection). Rewrote first paragraph of UTF-8, since Windows 10 1803 apparently has that option now. Also, chcp does work for UTF-8 even before Nov 2017; it was there when WSL came out. Just get a copy of Windows 10 ffs. |
||
Line 5:
=== Windows NT based systems ===
Modern Windows versions like [[Windows XP]] and [[Windows Server 2003]], and prior to them [[Windows NT]] (3.x, 4.0) and Windows 2000 are shipped with [[Windows API|system libraries]] which support string [[character encoding|encoding]] of two types: UTF-16 (often called "Unicode" in Windows documentation) and an
Independent of the "UNICODE" switch, Windows also provides the "MBCS" API switch.<ref>{{cite web|title=Support for Multibyte Character Sets (MBCSs)|url=https://msdn.microsoft.com/en-us/library/5z097dxa.aspx|language=en}}</ref> This switch turns on some C functions prefixed with<code>_mbs</code>, and selects the 'A' functions for the current locale.<ref>{{cite web|title=Double-byte Character Sets|url=https://msdn.microsoft.com/en-us/library/windows/desktop/dd317794(v=vs.85).aspx|website=MSDN|accessdate=7 May 2018|quote=our applications use DBCS Windows code pages with the "A" versions of Windows functions.}}</ref>
The <code>IsTextUnicode</code> function uses a [[heuristic algorithm]] on a [[byte string]] passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like [[Microsoft Notepad|Notepad]], often gives incorrect results. This gave rise to legends about the existence of [[Easter egg (computing)|"Easter eggs"]] like [[Bush hid the facts]].▼
▲The <code>IsTextUnicode</code> function uses a [[heuristic algorithm]] on a [[byte string]] passed to it to detect whether this string represents UTF-16 text. For very short texts, this function, used by some applications like [[Microsoft Notepad|Notepad]], often gives incorrect results. This gave rise to legends about the existence of [[Easter egg (computing)|"Easter eggs"]] like [[Bush hid the facts]].<ref>{{cite web|url=http://blogs.msdn.com/b/oldnewthing/archive/2004/03/24/95235.aspx|title=Some files come up strange in Notepad - The Old New Thing|date=March 24, 2007|first=Raymond|last=Chen|website=blogs.msdn.com}}</ref>
=== Windows CE ===
Line 18 ⟶ 20:
== UTF-8 ==
Microsoft Windows has a code page designated for [[UTF-8]], [[code page 65001]]. Prior to Windows 10 insider build 17035 (November 2017)<ref>{{cite web|title=Windows10 Insider Preview Build 17035 Supports UTF-8 as ANSI|url=https://news.ycombinator.com/item?id=15710685|website=Hacker News|accessdate=7 May 2018}}</ref>, it was impossible to set the locale code page to 65001, leaving this code page only available for:
* Explicit conversion functions such as MultiByteToWideChar
* A manual "chcp" command that only changes the code page for the current program's context. This is used for [[conhost.exe]] windows running [[Windows Subsystem for Linux]].
Since insider build 17035 and the April 2018 update (nominal build 17134) for Windows 10, a "Beta: Use Unicode UTf-8 for worldwide language support" checkbox is available for setting the locale code page to UTF-8.{{efn|1=Found under control panel, "Region" entry, "Administative" tab, "Change system locale" button.}} However, this option can break legacy applications as they internally call old "[[DBCS]]" APIs which only support a maximum of 2 bytes in a character, such as IsDBCSLeadByte.
There are proposals to add an API to portable libraries such as [[Boost (C++ libraries)|Boost]] to do the necessary conversion, by adding new functions for opening and renaming files. These functions would pass filenames through unchanged on Unix, but translate them to UTF-16 on Windows.<ref>{{cite web|url=http://cppcms.com/files/nowide/html/|title=Boost.Nowide}}</ref>
Many applications imminently have to support UTF-8 because it is the most-used Unicode encoding scheme in various [[network protocol]]s, including the [[Internet Protocol Suite]]. An application which has to pass UTF-8 to or from a 'W' [[Windows API]] should call the functions [[MultiByteToWideChar]] and WideCharToMultiByte.<ref>{{cite web|url=https://stackoverflow.com/questions/166503/utf-8-in-windows|title=UTF-8 in Windows|publisher=[[Stack Overflow]]|accessdate=July 1, 2011}}</ref> To get predictable handling of errors and surrogate halves it is more common for software to implement their own versions of these functions.
==Notes==
{{notefoot}}
== References ==
|