Talk:Unicode/Archive 6: Difference between revisions

Content deleted Content added
m Archiving 1 discussion(s) from Talk:Unicode) (bot
Roeschter (talk | contribs)
No edit summary
Line 343:
: -[[User:DePiep|DePiep]] ([[User talk:DePiep|talk]]) 22:05, 2 December 2014 (UTC)
* '''Updated'''. We could use a tempalte that marks Unicode articles. -[[User:DePiep|DePiep]] ([[User talk:DePiep|talk]]) 23:43, 25 May 2016 (UTC)
 
==Writing Systems still unable to viewed properly in Unicode==
As of September 2016, however, Unicode is unable to properly display the fonts by default for the following unicode writing systems on most browsers (namely, [[Microsoft Edge]], [[Internet Explorer]], [[Google Chrome]] and [[Mozilla Firefox]]):
 
{{large|
*[[Balinese alphabet]] (ᬅᬓ᭄ᬱᬭᬩᬮᬶ)
*[[Batak alphabet]] (ᯘᯮᯮᯒᯖ᯲ ᯅᯖᯂ᯲, also used for the Karo, Simalungun, Pakpak and Angkola-Mandailing languages)
*[[Baybayin script]] (ᜊᜌ᜔ᜊᜌᜒᜈ᜔)
*[[Chakma script]] (𑄇𑄳𑄡𑄈𑄳𑄡 𑄉𑄳𑄡)
*[[Hanunó'o alphabet]] (ᜱᜨᜳᜨᜳᜢ)
*[[Limbu script]] (ᤔᤠᤱᤜᤢᤵ)
*[[Pollard script]] (𖼀𖼁𖼂𖼃𖼄𖼅𖼆𖼇)
*[[Saurashtra script]] (ꢱꣃꢬꢯ꣄ꢡ꣄ꢬ)
*[[Sharada script]] (𑆐𑆑𑆒𑆓𑆔𑆕𑆖𑆗𑆘)
*[[Sundanese script]] (ᮃᮊ᮪ᮞᮛ ᮞᮥᮔ᮪ᮓ)
*[[Sylheti Nagari]] (ꠍꠤꠟꠐꠤ ꠘꠣꠉꠞꠤ)
*[[Tai Tham alphabet]] (ᨲ᩠ᩅᩫᨾᩮᩥᩬᨦ)
}}
 
Prior to Windows 7, scripts such Burmese (မြန်မာဘာသာ), Khmer (ភាសាខ្មែរ), Lontara (ᨒᨚᨈᨑ), Cherokee (ᎠᏂᏴᏫᏯ), Coptic (ϯⲙⲉⲧⲣⲉⲙⲛ̀ⲭⲏⲙⲓ), Glagolitic (Ⰳⰾⰰⰳⱁⰾⰻⱌⰰ), Gothic (𐌲𐌿𐍄𐌹𐍃𐌺), Cunneiform (𐎨𐎡𐏁𐎱𐎡𐏁), Phags-pa (ꡖꡍꡂꡛ ꡌ), Traditional Mongolian (ᠮᠣᠨᠭᠭᠣᠯ ), Tibetan (ལྷ་སའི་སྐད་), Odia alphabet (ଓଡ଼ିଆ ) also had this font display issue but have since been resolved (ie. can now be 'seen' on most browsers).
 
Could someone also enable these fonts to be visible on Wikipedia browsers? --[[User:Sechlainn|Sechlainn]] ([[User talk:Sechlainn|talk]]) 02:23, 29 September 2016 (UTC)
 
: I don't know what you mean. Unicode is the underlying standard that makes it possible to use those scripts at all. Properly showing the texts is a matter of operating system, fonts and web browser. Even just OS and browser isn't good enough; what language packs and fonts are installed are important. There's nothing that anyone can in general do here.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 02:49, 29 September 2016 (UTC)
 
:: {{ping|Sechlainn}} 1.&nbsp;Please [[Wikipedia:No original research|'''do not engage in original research''']]. — 2.&nbsp;Unicode is not intended to “display the fonts.” — 3.&nbsp;These are Unicode scripts, not writing systems. — 4.&nbsp;I can view all of the above except Sharada on my Firefox. — 5.&nbsp;There is no such thing as “Wikipedia browsers.” <small>[[Wikipedia:WikiLove|Love]]</small>&nbsp;—[[:commons:User:LiliCharlie|LiliCharlie]]&nbsp;<small>([[User talk:LiliCharlie|talk]])</small> 03:02, 29 September 2016 (UTC)
 
== Unicode 10.0 ==
 
This version has just been released today, can you add information for this into the article? Proof from Emojipedia [[Special:Contributions/86.22.8.235|86.22.8.235]] ([[User talk:86.22.8.235|talk]]) 12:03, 20 June 2017 (UTC)
:I haven't seen anything on the Unicode site (http://www.unicode.org/) but will keep an eye out for an official announcement that 10.0 has been released. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:01, 20 June 2017 (UTC)
::Version 10.0 now shows up as the latest version at http://www.unicode.org/standard/standard.html [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:44, 20 June 2017 (UTC)
:::And the [http://unicode.org/Public/UNIDATA/ data files] have been updated, so I think we can start updating Wikipedia now. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 19:23, 20 June 2017 (UTC)
 
== "Presentation forms" ==
 
Can someone explain to me what a "presentation form" is? I can't find an answer anywhere. [[User:Pariah24|Pariah24]] ([[User talk:Pariah24|talk]]) 11:19, 10 September 2017 (UTC)
:Nevermind; I found [http://unicode.org/faq/ligature_digraph.html this] [[User:Pariah24|Pariah24]] ([[User talk:Pariah24|talk]]) 11:23, 10 September 2017 (UTC)
 
== Is there a unicode symbol for "still mode"? ==
 
I mean this symbol: https://www.iso.org/obp/ui#iec:grs:60417:5554 [[User:Seelentau|Seelentau]] ([[User talk:Seelentau|talk]]) 18:16, 12 January 2018 (UTC)
:It seems not. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 19:03, 12 January 2018 (UTC)
 
== Writing Systems still unable to viewed properly in Unicode==
As of September 2016, however, Unicode is unable to properly display the fonts by default for the following unicode writing systems on most browsers (namely, [[Microsoft Edge]], [[Internet Explorer]], [[Google Chrome]] and [[Mozilla Firefox]]):
 
{{large|
*[[Balinese alphabet]] (ᬅᬓ᭄ᬱᬭᬩᬮᬶ)
*[[Batak alphabet]] (ᯘᯮᯮᯒᯖ᯲ ᯅᯖᯂ᯲, also used for the Karo, Simalungun, Pakpak and Angkola-Mandailing languages)
*[[Baybayin script]] (ᜊᜌ᜔ᜊᜌᜒᜈ᜔)
*[[Chakma script]] (𑄇𑄳𑄡𑄈𑄳𑄡 𑄉𑄳𑄡)
*[[Hanunó'o alphabet]] (ᜱᜨᜳᜨᜳᜢ)
*[[Limbu script]] (ᤔᤠᤱᤜᤢᤵ)
*[[Pollard script]] (𖼀𖼁𖼂𖼃𖼄𖼅𖼆𖼇)
*[[Saurashtra script]] (ꢱꣃꢬꢯ꣄ꢡ꣄ꢬ)
*[[Sharada script]] (𑆐𑆑𑆒𑆓𑆔𑆕𑆖𑆗𑆘)
*[[Sundanese script]] (ᮃᮊ᮪ᮞᮛ ᮞᮥᮔ᮪ᮓ)
*[[Sylheti Nagari]] (ꠍꠤꠟꠐꠤ ꠘꠣꠉꠞꠤ)
*[[Tai Tham alphabet]] (ᨲ᩠ᩅᩫᨾᩮᩥᩬᨦ)
}}
 
Prior to Windows 7, scripts such Burmese (မြန်မာဘာသာ), Khmer (ភាសាខ្មែរ), Lontara (ᨒᨚᨈᨑ), Cherokee (ᎠᏂᏴᏫᏯ), Coptic (ϯⲙⲉⲧⲣⲉⲙⲛ̀ⲭⲏⲙⲓ), Glagolitic (Ⰳⰾⰰⰳⱁⰾⰻⱌⰰ), Gothic (𐌲𐌿𐍄𐌹𐍃𐌺), Cunneiform (𐎨𐎡𐏁𐎱𐎡𐏁), Phags-pa (ꡖꡍꡂꡛ ꡌ), Traditional Mongolian (ᠮᠣᠨᠭᠭᠣᠯ ), Tibetan (ལྷ་སའི་སྐད་), Odia alphabet (ଓଡ଼ିଆ ) also had this font display issue but have since been resolved (ie. can now be 'seen' on most browsers).
 
Could someone also enable these fonts to be visible on Wikipedia browsers? --[[User:Sechlainn|Sechlainn]] ([[User talk:Sechlainn|talk]]) 02:23, 29 September 2016 (UTC)
 
: I don't know what you mean. Unicode is the underlying standard that makes it possible to use those scripts at all. Properly showing the texts is a matter of operating system, fonts and web browser. Even just OS and browser isn't good enough; what language packs and fonts are installed are important. There's nothing that anyone can in general do here.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 02:49, 29 September 2016 (UTC)
 
:: {{ping|Sechlainn}} 1.&nbsp;Please [[Wikipedia:No original research|'''do not engage in original research''']]. — 2.&nbsp;Unicode is not intended to “display the fonts.” — 3.&nbsp;These are Unicode scripts, not writing systems. — 4.&nbsp;I can view all of the above except Sharada on my Firefox. — 5.&nbsp;There is no such thing as “Wikipedia browsers.” <small>[[Wikipedia:WikiLove|Love]]</small>&nbsp;—[[:commons:User:LiliCharlie|LiliCharlie]]&nbsp;<small>([[User talk:LiliCharlie|talk]])</small> 03:02, 29 September 2016 (UTC)
 
== Two things this STILL does poorly ==
 
First it still reads like a technical manual written by experts for experts. It still refuses to explain, upfront, what a codepoint is. The related concepts of character, glyph, as well as the fonts involved all need to be discussed, imho. It should be made clear in the lead that Unicode has numerous failures: it is unable to correct past mistakes, and is (and will almost certainly continue to be) limited by political pressure (including by sovereign states such as China and N. Korea). Some of what is in the Unicode standard is there due to political concession, and of course all of it is there due to decisions made by committee(s). That's one thing. The other is the articles virtually complete failure to tackle the Windows operating system, which is far-and-away the dominant OS in the world. Windows does not handle Unicode. In order for an application, be it a web browser or a spell-checker or a chat app, to handle Unicode, it has to work around the Windows character tables. (Of course, if the article doesn't explain what the difference is between a codepoint and a character (or "wide-character"), then you've failed before you begin. I think, and propose, that at the LEAST, a section under "Issues" should be created to simply state that despite Microsoft's continued deceptive and misleading claims about its support for Unicode, that it and its Windows OS, does not directly support Unicode. (Microsoft's Word has impressive support, but still contains large omissions of the 136,000 codepoints.)[[Special:Contributions/75.90.36.201|75.90.36.201]] ([[User talk:75.90.36.201|talk]]) 20:22, 9 April 2018 (UTC)
:Bizarre and totally incorrect statement about Microsoft Windows not directly supporting Unicode. Of course Windows (excluding obsolete W95, W98 and ME) directly and natively supports Unicode, and no Unicode-aware application running on Windows needs to "work around the Windows character tables". [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:26, 10 April 2018 (UTC)
::Windows does not support UTF-8 or any other coverage of Unicode in the 8-bit api, which means standard functions to open or list files do not work for filenames with Unicode in them. This makes it impossible to write portable software using the standard functions that works with Unicode filenames, therefore Windows does not support Unicode.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 21:34, 10 April 2018 (UTC)
::: Does C# even support that "8-bit API"? What do you mean by "portable software"? I would note the POSIX standard doesn't support Unicode in file names either; only A-Za-z0-9, hyphen, period and underscore can be used in portable POSIX filenames. And non-POSIX MacOS/Plan 9/BeOS programs aren't portable, so I believe it is impossible to write portable software using Unicode.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 23:18, 10 April 2018 (UTC)
::::By "portable" I mean "source code that works on more than one platform", stop trying to redefine it as "every computer ever invented in history". Modern C/C++ compilers will preserve the 8-bit values in quoted strings and thus preserve UTF-8. Only VC++ is broken here, though you can outwit it by claiming that the source code is *not* Unicode (???!). POSIX allows all byte values other than '/' and null in a filename and thus allows UTF-8. POSIX does go way off course when discussing shell quoting syntax and you are right it disallows some byte values.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 01:32, 11 April 2018 (UTC)
::::Oh and OS/X works exactly as I have stated, it in fact has some of the best Unicode support, though their insistence on normalizing the filenames rather than just preserving the byte sequence is a bit problematic. But at least all the software knows the filenames are UTF-8.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 01:34, 11 April 2018 (UTC)
::::: What do you mean by more than one platform? I have no reason to believe that C# doesn't support Unicode filenames, and thus every version of Windows NT since 4.0 (and thus every version of Windows since XP) supports portable code using Unicode filenames. If any non-POSIX MacOS X program is "portable", then so is a C# program targeting NET 1.0.
::::: POSIX does not allow "all byte values other than '/' and null in a filename"; to quote David Wheeler here, https://www.dwheeler.com/essays/fixing-unix-linux-filenames.html says
::::::: For a filename to be portable across implementations conforming to POSIX.1-2008, it shall consist only of the portable filename character set as defined in Portable Filename Character Set. Portable filenames shall not have the <hyphen> character as the first character since this may cause problems when filenames are passed as command line arguments.
:::::: I then examined the Portable Filename Character Set, defined in 3.276 (“Portable Filename Character Set”); this turns out to be just A-Z, a-z, 0-9, <period>, <underscore>, and <hyphen> (aka the dash character). So it’s perfectly okay for a POSIX system to reject a non-portable filename due to it having “odd” characters or a leading hyphen.
::::: If strictly following IEEE 1003, the only major operating system standard, is important to you, filenames shall come only from that set of 65 characters. In practice it's better, but a program strictly conforming to the standard is so limited.
::::: So as far as I can tell, Windows is in the same boat as everyone else.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 22:30, 11 April 2018 (UTC)
::::::You are continuing to insist that "portable" means "it works exactly the same on every single computer ever made", while I am going by the more popular definiton of "it works on more than one computer". If you insist on such silly impossible requirements it is obvious you are refusing to admit you are wrong.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 00:55, 12 April 2018 (UTC)
::::::: Portable, as strictly conforming to the POSIX standard, would be nice. Portable, as in running on multiple operating systems, is more realistic. Portable, as in running on multiple versions of the same OS, is barely passable. "It works on more than one computer" is not the "more popular" definition, as short of being tied into specialized one-off hardware like Deep Blue, you can always image the the drive and load it into an emulator on another system.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 19:08, 12 April 2018 (UTC)
::::::::As I note below, your "POSIX" complaint is actually entirely backwards. It *REDUCES* the number of filenames possible on some systems, therefore it has no effect on the fact that fopen() on Unix can open all files, but cannot on Windows.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
: I think you have a point about the way we mention codepoints in the opening.
: As for numerous failures, "Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems." If you understand what that says, it tells you that it this is a committee project that pays a price for backward compatibility and works with the user community. To compare and contrast, the TRON character encoding doesn't work with sovereign states like China; devoid of such political pressure, it doesn't support Zhuang or Cantonese written in Han characters. Without multinational committees and political pressure, its support of anything that's not Japanese is half-assed and generally copied from Unicode.
: There are computer projects that work on the benevolent dictator standard, like the Linux kernel and Python. But I don't know of any that don't center around one chunk of source code, that involve an abstract standard with multiple equal implementations. If you have a seriously complex project, like encoding all of human writing, and it's going to be core for Microsoft and Apple and Google and Oracle, it's going to be a committee that responds to political needs. And standards are really interesting only if Microsoft and Apple and Google and Oracle care; stuff like Dart and C# may technically be standards, but users use the Google tools for Dart and follow what Google puts out, and likewise for Microsoft and C#. (Or SQL, where there is a standard with multiple implementations, but one still has to learn MSSQL and Oracle Database and MySQL separately. I don't if that's an under-specified standard, or companies just ignoring it, but certainly the solution is not listening to the companies less.)--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 23:18, 10 April 2018 (UTC)
:::fopen("stringWithUnicodeInIt.æ") does not do what anybody wants on Windows. On Linux it works. Therefore support for Unicode is better on Linux than Windows, which is not very impressive for Windows...[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 01:25, 11 April 2018 (UTC)
:::: On Linux it may work; but "filenames" in Linux aren't names, they aren't strings, they're arbitrary byte-sequences that don't include 00h or 2Fh, and the most reasonable interpretation of the filename as a string may require choosing a character set on a per-filename basis; in worst-case scenarios, say a user in locale zh-TW mass renamed a bunch of files to start with 檔案 (archive), without paying attention to the fact they were named by a user in locale fr-FR.ISO8859-1 ("archivé"), you can end up with a byte string that makes no sense under any single character set.
:::: To put it shorter, that may work in Linux, but it may also fail to open a file with a user-visible name of "stringWithUnicodeInIt.æ", depending on locale settings.
:::::No, that æ is the UTF-8 byte sequence for that character and it is unaffected by the "locale" and therefore it always works.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
:::: Not to mention that judging Windows by C alone is unfair and silly; why not C# or Python or other languages?--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 22:30, 11 April 2018 (UTC)
:::::Because C# api is a Microsoft developement and they wrote the Linux version, and thus any failings are their fault.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
:::::Oddly enough while you insist that C code work on EVERY SINGLE COMPUTER EVER MADE, you seem to think "limit programming languages to C#" is A-OK. You are weird. And that file will open on Linux no matter what the "locale" is set to, that is the point. The filename is a string of bytes, just like you describe, and never ever should be dependent on "locale". Linux gets this right, Windows does not. Sorry.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 00:55, 12 April 2018 (UTC)
:::::: You chose the example. "stringWithUnicodeInIt.æ" is not a string of bytes; it is a string of characters. Said characters map to bytes in various ways, and provided (and this is far from guaranteed) the compiler and the system and the user that created the file all agree on UTF-8, fopen will work. If one of them is thinking in Latin-1, and the charset of the system, filename, and compiler can all be changed independently, then it will not work.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 19:08, 12 April 2018 (UTC)
:::::::No, that string has 2 bytes at the end that are the UTF-8 encoding. Any system (such as C# I guess) that turns it into a different byte string is broken.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
 
The problem with Windows is not being understood correctly. On Windows filesystems such as NTFS, filenames are arbitrary sequences of 16-bit words (technically UTF-16 but they allow invalid UTF-16 with unpaired surrogates and they disallow some valid UTF-16). The problem is that the equivalent of open(char* filename) on Windows, which is used by virtually every portable library including the C and C++ libraries written by Microsoft, *cannot* open every possible file, as there are patterns of 16-bit words that cannot be achieved by any 8-bit string. This makes it impossible, for instance, to write a piece of software that opens a arbitrary file chosen by the user.
 
An obvious fix is to make open(char*) use a translator that *can* produce all valid sequences of 16-bit words, and have readdir and similar functions do the opposite translation. And Windows already provides a way to change this translation, yet it refuses to allow a setting that will work (a correct setting would be UTF-8 but also allow unpaired surrogates). The end result, which is quite obvious to anybody working with large multi-platform setups, is that you are restricted to ASCII-only filenames everywhere. This is a convincing argument for many that Windows does not support Unicode.
 
On Unix filenames are sequences of 8-bit bytes, and it is a POSIX requirement that the 8-bit sequence passed to open(char*) be used unchanged to match the filename, so you can in fact name all possible files. POSIX requirements that a certain subset of ASCII is required to work in filenames only reduces the set of filenames if a system chooses to disallow bytes outside that set, you can still name all possible files and quite a few disallowed files using open(char*), so mentioning that is a red herring.
 
C# on Unix is almost certainly using UTF-16 strings in their open() api, and are using a brain-dead converter to 8-bit strings. Their converter likely is using the "locale" to convert some unpredictable 256-code-point subset to bytes and ignoring the others. As they are in charge of the reverse converter to UTF-16 there is no reason at all to do this, they should use some fixed loss-less variation of UTF-8 in both directions. Python-3 and Qt do this which makes it work much better (but far from perfect as they botch up filenames containing invalid UTF-8, these complications are why use of UTF-16 is strongly discouraged by many).
 
Spitzak (talk) 18:35, 12 April 2018 (UTC)
 
: Unix disallows some valid UTF-8 strings, like any including '/' or '\0'. So what? I gave you chapter and verse above where it is not a POSIX requirement to support any 8-bit sequence passed to open; that in fact the only sequences that POSIX requires a system to handle come from a 65-character subset of ASCII (and even then, no hyphens at the start of filenames).
::Okay, I am going to try to make this clear, as your convoluted arguments got me confused as well as you. Let's say there is a system that does not allow 'Z' in a filename. Does it somehow mean that fopen() cannot open all files? NO!!!! You can still send a 'Z' to fopen and it will cause an error. You can also still send all the valid strings that don't contain 'Z'. Now lets say that there is an fopen() call that *removes* any 'Z' from the string, despite the fact that 'Z' is allowed in a filename. Now you can no longer open all possible files, a very serious problem! Your complaint about POSIX is the first thing I describe. The problem with Windows is the second one.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
: "there are patterns of 16-bit words that cannot be achieved by any 8-bit string." I have no idea what you're getting at here. Taken literally, that's false; a 16-bit word is two 8-bit bytes. If you were talking about null-terminated strings, then you couldn't use ASCII at all. I know of glitches in NTFS and NTFS support where you can create filenames that can't be handled by normal programs, but that's not really a Unicode support issue. What you're talking about is unclear.
::Obviously it is technically possible to make a mapping from 8-bit to 16-bit strings that can produce all possible 16-bit strings. DUH! The problem is that the set of translators Windows provides for the fopen() call does not include one that can do it, despite an obvious candidate (UTF-8 with support for unpaired surrogates).[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
: "Python-3 and Qt do this" ... and Python 3 rejects some valid filenames on Linux that can't be treated as UTF-8.
::Again, holy crap. Here is a direct quote from my text: " (but far from perfect as they botch up filenames containing invalid UTF-8, these complications are why use of UTF-16 is strongly discouraged by many)." Did you even read before typing?[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
: Millions of people can and do use Unicode filenames on Windows everyday. If you really want to avoid all compatibility issues over multiple system, differences in case sensitivity and normalization are going to bite you faster than any problem with Unicode names on Windows.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 19:32, 12 April 2018 (UTC)
::And they are using software that was not written with portable api's. If you worked in an industry that uses source code from many sources you would know that we have to give up on on any filenames that are not ASCII. A *single* program that uses a C++ library that takes a filename as a string (rather than an open file descriptor) will force your entire operation to ASCII-only filenames instantly. This is not a joke and it is a real problem.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 17:07, 13 April 2018 (UTC)
 
: {{ping|Spitzak}} I have no idea what you're talking about. "The problem is that the set of translators Windows provides for the fopen() call does not include one that can do it, despite an obvious candidate (UTF-8 with support for unpaired surrogates)" goes right into my "axe-grinding developer; not a real problem" pile. What's the problem here?
::They provide an API that allows some variable-width encodings, but refuse to support the one encoding every needs.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
: "No, that string has 2 bytes at the end that are the UTF-8 encoding. Any system (such as C# I guess) that turns it into a different byte string is broken." If it's text, then you don't know and shouldn't care how it's encoded, whether it's UTF-1, SCSU, UTF-9, or UTF-32. It's not a byte string; [https://www.mediawiki.org/wiki/Unicode_normalization_considerations MediaWiki normalizes] and thus turns anything you write here potentially into other byte strings.
::I am assuming the source code is in UTF-8. If you really insist, write the string so it ends with "\xc3\xa6" which will produce the correct bytes even in brain-dead compilers that thing the "locale" is more important than the actual literal encoding of the source file.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
: You say "they are using software that was not written with portable api's"; according to your definition of "it works on more than one computer", those APIs merely have to work on Windows 10 in the French in France locale to be portable. Which they do, because otherwise the French would be up in arms. Again, judging an operating system solely by languages developed at Bell Labs for Unix seems a bit ... parochial. And antiquated.
::This is a Microsoft-written library that is explicitly advertised as supporting an international standard api. And what they have will fail even if you want to "port" between Windows set to the French and the Russian locale, in that you will be unable to open the same set of files in those two locales.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
: I think I've finally figured out what you're going on about; Unix C/C++ uses char to support Unicode, whereas Windows expects wchar_t if you want to handle Unicode strings (including filenames)[https://msdn.microsoft.com/en-us/library/windows/desktop/dd317748(v=vs.85).aspx][https://msdn.microsoft.com/en-us/library/windows/desktop/dd374131(v=vs.85).aspx]. It might be frustrating that they chosen this design feature, but it's hardly relevant here.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 20:53, 13 April 2018 (UTC)
::I expect an api defined as being industry-standard to be able to open all files. I don't care how the system stores filenames internally as long as the translation from 8-bit byte strings is obvious. And there is a blindingly obvious method to convert the industry-standard api to these internal filenames. The converse problem of transling 16-bit strings to 8-bit on Unix is much worse as there is not a good consensus (which is why, as you noticed, Python and Qt botch it often). So the fact is Microsoft has the really trivial easy job to fix this and they have not done so. Or are you really going to say that because internally it uses 16-bit units, that we should NEVER use 8-bit encodings? Really???[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
::Here is a typical of the thousands and thousands of patches that have been applied to "portable" source code to get it to work on Windows: https://cgit.freedesktop.org/cairo/commit/?id=84fc0ce91d1a57d20500f710abc0e17de82c67df This crap should NOT be necessary![[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
 
I don't see how Spitzak's arguments touch [https://www.unicode.org/versions/Unicode10.0.0/ch03.pdf#page=2 conformance as defined in the standard]. <small>[[Wikipedia:WikiLove|Love]]</small>&nbsp;—[[:commons:User:LiliCharlie|LiliCharlie]]&nbsp;<small>([[User talk:LiliCharlie|talk]])</small> 21:50, 13 April 2018 (UTC)
 
:It has nothing to do with conformance. There was a simple sentence about the FACT that you cannot open Unicode-named files using the api that Microsoft uses that most or all C and C++ libraries use. Somebody up above tried to contradict it and it went down from there.[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 18:17, 16 April 2018 (UTC)
 
:: You wrote "This makes it impossible to write portable software using the standard functions that works with Unicode filenames, therefore Windows does not support Unicode." In fact, it is impossible to use one API to access Unicode-named files on Windows, but you can use portable software in languages like Java and C# on Windows that works with Unicode filenames just fine. A system can support Unicode without supporting C/C++ in any way, or in any sane way.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 02:04, 17 April 2018 (UTC)