Talk:Unicode/Archive 6: Difference between revisions

Content deleted Content added
Roeschter (talk | contribs)
No edit summary
Fix Linter errors.
 
(13 intermediate revisions by 3 users not shown)
Line 42:
:Yeah, somewhere, just not on Wikipedia. See [[WP:ELNO]]. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:06, 11 March 2010 (UTC)
==Formatting References==
I've taken to formatting some of the bare URLs here, using the templates from [[WP:CT]]. <i>[[User:Omirocksthisworld|<b><font colorstyle="color:#32B430;">[[User:Omirocksthisworld|Omirocksthisworld]]</font></b>]]</i>([[User talk:Omirocksthisworld|<fontspan colorstyle="color:#1A74E2;">[[User talk:Omirocksthisworld|Drop a line]]</fontspan>]]) 21:25, 16 March 2010 (UTC)
 
== Unicode block names capitalization (Rename and Move) ==
Line 153:
: If you find reliable sources for criticism or even discussions for/against Unicode, feel free to add the material. However, criticism sections are not mandatory. There is none in the [[Oxygen]] article for example. --[[User:Mlewan|Mlewan]] ([[User talk:Mlewan|talk]]) 18:11, 26 September 2013 (UTC)
:: You are obviously joking. There must be sources, as files in Unicode format take twice as much size as ANSI ones, and you cannot use simple table lookup algorithms anymore. This information is just waiting for someone speaking English to make it public. [[Special:Contributions/178.49.18.203|178.49.18.203]] ([[User talk:178.49.18.203|talk]]) 11:38, 27 September 2013 (UTC)
:::I'm sorry, but you are mistaken. You are confusing scalar values with encodings. In Unicode, these are completely different entities. The UTF-8 byte value of {{UTF-8#invoke:Unicode convert|getUTF8|10A05}} is identical to UTF-16 {{UTF-16#invoke:Unicode convert|getUTF16|10A05}}, which are both encodings of U+10A05. When you get down to things like Z - U+005A, the UTF-8 ends up as a single byte: {{UTF-8#invoke:Unicode convert|getUTF8|005A}}, taking up exactly as much disk space as its ANSI encoding. The fact that it has a four digit scalar value is irrelevant to how much room it takes on disk. Stateful encodings like BOCU and SCSU can bring this efficiency in data storage to every script, and multi-script documents can actually end up with smaller file sizes than in legacy encodings. [[User:Vanisaac|Van]][[User talk:Vanisaac|Isaac]]<sub><small>[[WP:WikiProject Writing systems|WS]] [[WP:WikiProject Heraldry and vexillology|Vex]]</small></sub><sup style="margin-left:-7.0ex">[[Special:Contributions/Vanisaac|contribs]]</sup> 13:34, 27 September 2013 (UTC)
 
:::: Stateful encodings are not generally useful. On the other hand, the requirement to represent, let's say, letter А as 1040 instead of some sane value like 192, and implement complex algorithms to make the lookup over 2M characters' size tables possible. And the requirement to use complex algorithms for needs of obscure scripts. It is clearly a demarch to undermine software development in 2nd/3rd world countries, as 1st world ones can simply roundtrip that Unicode hassle with trivial solutions. For the first world, 1 character is always 1 byte, like it always was. [[Special:Contributions/178.49.18.203|178.49.18.203]] ([[User talk:178.49.18.203|talk]]) 11:55, 28 September 2013 (UTC)
Line 310:
এরা আপনার হৃদয়কে সারাজীবন আলোড়ীত করবে। তাই এদের সঙ্গ কখরো ত্যাগ করবেন না। খারাপ বন্ধু তা যতোই কাছের হোক না কেন, ত্যাগ করুন। নাহলে খারাপ চিন্তা আপনাকে আক্রান্ত করবে। মনে রাখবেন, ভাল চিন্তার চেয়ে খারাপ চিন্তাই মানুষকে বেশি আকর্ষন করে। <small><span class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Monitobd|Monitobd]] ([[User talk:Monitobd|talk]] • [[Special:Contributions/Monitobd|contribs]]) 12:34, 13 January 2010 (UTC)</span></small><!-- Template:Unsigned --> <!--Autosigned by SineBot-->
 
:This isn't [[Devanagari]] ([[Hindi]]). I used script recognition software to find out what language this is, and apparently it's "[[Bishnupriya Manipuri]]". Can anyone read it? I searched everywhere, and there's not a single online translator. Should I just ignore it... [[User:Indigochild777|'''<fontspan facestyle="font-family:Vivaldi"; font-size="4:large;">'''<fontspan colorstyle="color:#000000;">Ind</fontspan><fontspan colorstyle="color:#770000;">igo</fontspan><fontspan colorstyle="color:#BB0000;">child''' </fontspan></fontspan>''']] 01:42, 12 April 2010 (UTC)
It is bengali(Bangla,india). <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/112.133.214.254|112.133.214.254]] ([[User talk:112.133.214.254|talk]]) 06:02, 2 January 2013 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
 
Line 347:
As of September 2016, however, Unicode is unable to properly display the fonts by default for the following unicode writing systems on most browsers (namely, [[Microsoft Edge]], [[Internet Explorer]], [[Google Chrome]] and [[Mozilla Firefox]]):
 
{{largelargediv|
*[[Balinese alphabet]] (ᬅᬓ᭄ᬱᬭᬩᬮᬶ)
*[[Batak alphabet]] (ᯘᯮᯮᯒᯖ᯲ ᯅᯖᯂ᯲, also used for the Karo, Simalungun, Pakpak and Angkola-Mandailing languages)
Line 390:
As of September 2016, however, Unicode is unable to properly display the fonts by default for the following unicode writing systems on most browsers (namely, [[Microsoft Edge]], [[Internet Explorer]], [[Google Chrome]] and [[Mozilla Firefox]]):
 
{{largelargediv|
*[[Balinese alphabet]] (ᬅᬓ᭄ᬱᬭᬩᬮᬶ)
*[[Batak alphabet]] (ᯘᯮᯮᯒᯖ᯲ ᯅᯖᯂ᯲, also used for the Karo, Simalungun, Pakpak and Angkola-Mandailing languages)
Line 477:
 
:: You wrote "This makes it impossible to write portable software using the standard functions that works with Unicode filenames, therefore Windows does not support Unicode." In fact, it is impossible to use one API to access Unicode-named files on Windows, but you can use portable software in languages like Java and C# on Windows that works with Unicode filenames just fine. A system can support Unicode without supporting C/C++ in any way, or in any sane way.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 02:04, 17 April 2018 (UTC)
 
== "code-point" vs. "character" ==
 
How is the term "character" defined in Unicode and how does it differ from "codepoint"? I miss that information in the article. --[[Special:Contributions/62.224.160.232|62.224.160.232]] ([[User talk:62.224.160.232|talk]]) 14:17, 17 August 2016 (UTC)
:Unicode Standard sections [http://www.unicode.org/versions/Unicode9.0.0/ch02.pdf#G25564 2.4 Code Points and Characters] and [http://www.unicode.org/versions/Unicode9.0.0/ch03.pdf#G2212 3.4 Characters and Encoding] define the terms code point and abstract character. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:20, 17 August 2016 (UTC)
::The [[code point]] article also covers this information. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:26, 17 August 2016 (UTC)
 
== Use template:code? ==
Should we use the template like <nowiki>{{code|U+012F}}</nowiki> for {{code|U+012F}} to express Unicodetext? To me it looks sound. -[[User:DePiep|DePiep]] ([[User talk:DePiep|talk]]) 01:25, 6 May 2010 (UTC)
:Yes, I think that is a good idea. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 14:05, 17 July 2010 (UTC)
::{{done}} Somewhat differently. See {{tl|unichar}} -[[User:DePiep|DePiep]] ([[User talk:DePiep|talk]]) 22:02, 19 November 2010 (UTC) <br>
:::Violation of [[MOS:HEX]]. [[Special:Contributions/108.71.123.44|108.71.123.44]] ([[User talk:108.71.123.44|talk]]) 18:32, 8 October 2016 (UTC)
 
== Unicode 10.0 ==
 
This version has just been released today, can you add information for this into the article? Proof from Emojipedia [[Special:Contributions/86.22.8.235|86.22.8.235]] ([[User talk:86.22.8.235|talk]]) 12:03, 20 June 2017 (UTC)
:I haven't seen anything on the Unicode site (http://www.unicode.org/) but will keep an eye out for an official announcement that 10.0 has been released. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:01, 20 June 2017 (UTC)
::Version 10.0 now shows up as the latest version at http://www.unicode.org/standard/standard.html [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:44, 20 June 2017 (UTC)
:::And the [http://unicode.org/Public/UNIDATA/ data files] have been updated, so I think we can start updating Wikipedia now. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 19:23, 20 June 2017 (UTC)
 
== "Presentation forms" ==
 
Can someone explain to me what a "presentation form" is? I can't find an answer anywhere. [[User:Pariah24|Pariah24]] ([[User talk:Pariah24|talk]]) 11:19, 10 September 2017 (UTC)
:Nevermind; I found [http://unicode.org/faq/ligature_digraph.html this] [[User:Pariah24|Pariah24]] ([[User talk:Pariah24|talk]]) 11:23, 10 September 2017 (UTC)
 
== Is there a unicode symbol for "still mode"? ==
 
I mean this symbol: https://www.iso.org/obp/ui#iec:grs:60417:5554 [[User:Seelentau|Seelentau]] ([[User talk:Seelentau|talk]]) 18:16, 12 January 2018 (UTC)
:It seems not. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 19:03, 12 January 2018 (UTC)
 
== Censorship of recent thread on talk page ==
 
Contrary to [https://en.wikipedia.org/w/index.php?title=Talk:Unicode&diff=844563671&oldid=844563249 this edit's] edit summary, the discussion ''did'' have criticisms of, and suggestions for changes to, the content of this article, and discussed more implementations than Windows. Further, the article already discusses implementation specific issues. First I thought it was simply deleted by {{u|Roeschter}} but at least he/she/they placed it in the archives. Is this censorship of the talk page justified (perhaps on the unstated grounds that it violated NOTFORUM)? I don't believe so by its edit summary. [[User:DIYeditor|—DIYeditor]] ([[User talk:DIYeditor|talk]]) 18:52, 5 June 2018 (UTC)
 
== The zero-width space is a space ==
 
[[Special:Contributions/75.90.36.201|75.90.36.201]] says that the "[[Zero width space|zero-with space]]" is not a space. What is a space, though? It is a character that contains no points with an RGB color other than FFFFFF (or whatever the background color is). The zero-width space contains no such points and is therefore a space. (75.90.36.201 does admit that it is a character.) Of course, if there were a term for characters containing no points with a color other than FE3EE7, that term would also apply to the zero-width space.[[User:Peter M. Brown|Peter Brown]] ([[User talk:Peter M. Brown|talk]]) 21:45, 6 June 2018 (UTC)
 
: <code>U+200B</code> ZERO WIDTH SPACE has the [[Unicode character property]] <code>WSpace=no</code> (not a [[whitespace character]]). <small>[[Wikipedia:WikiLove|Love]]</small>&nbsp;—[[User:LiliCharlie|LiliCharlie]]&nbsp;<small>([[User talk:LiliCharlie|talk]])</small> 22:17, 6 June 2018 (UTC)
::The cited <u>[[Unicode character property]]</u> article supports ''my'' point, including {{code|U+200B}} among the "whitespace characters without Unicode character property 'WSpace=Y'". [[User:Peter M. Brown|Peter Brown]] ([[User talk:Peter M. Brown|talk]]) 18:36, 7 June 2018 (UTC)
 
== Suggestion for changing the lede ==
 
I have a couple of problems with the last paragraph (as of Mar 3,2016) of the lede (lead). First, it continues to talk about USC-2. USC-2 IS OBSOLETE and it says so. So, why is it used as an example?
It is poor pedagogy to explain an obsolete system and then compare an active system to it. Currently, the paragraph reads:
"Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit (two 8-bit bytes) for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units (4 × 8 bit) to handle each of the additional characters."
The text "Unicode can be implemented" is a hypelink to the article "Comparison of Unicode encodings".
The hypelink should be removed and a reference used, probably "[see Comparison of Unicode encodings]". This first sentence is terrible. It is not true that Unicode can be implemented by different encodings, in the sense that an encoding is NOT an implementaion. Also: I don't think Unicode 8 is fully implemented by ANY program, anywhere. Unicode's codepoints ARE (not "can be") commonly encoded using UTF-8 and UTF-16. I suggest the following:"Unicode's codepoints are commonly encoded using UTF-8 and UTF-16. Other encodings, such as the now obsolete UCS-2 or the anglo-centric ASCII may also be encountered (ASCII defines 95 characters, USC-2 allows up to 65 536 code points). Both UTF-8 and UTF-16 use a variable number of bytes for the codepoint they represent: UTF-8 uses between 1 and 4 bytes and UTF-16 uses either 2 or 4 bytes. Since 2007, when it surpassed ASCII, UTF-8 has been the dominant encoding of the World Wide Web with an estimated 86% of all web pages using it as of January 2016."[[User:Abitslow|Abitslow]] ([[User talk:Abitslow|talk]]) 22:47, 3 March 2016 (UTC)
 
I have never seen any Unicode other than UTF-8 (servers) and UTF-32 (JavaScript, and Python "unicode" objects). Shouldn't those two be listed as the two most popular forms? Basically you use UTF-8 unless you want to index individual characters; then you use UTF-32 in those special cases. Isn't that pretty much the whole story right now? And then UTF-16 is of historical interest for Windows NT.
 
: Java is firmly 16-bit for characters, and every version of Windows since XP has been Windows NT, even if they don't call it that. C# and .NET use UTF-16, as well. What's most frequent is hard to tell, and depends on what you're measuring.--[[User:Prosfilaes|Prosfilaes]] ([[User talk:Prosfilaes|talk]]) 08:15, 28 September 2017 (UTC)
 
:16-bit code units are used plenty on Windows, all the system api has that, filenames in their newer filesystems use that, and many text files are written this way (that is becoming more rare rapidly however). Note there is a lot of confusion about whether Windows supports UTF-16 or UCS-2. Some software is "unaware" of UTF-16, but this does not mean it won't "work" with it. This is exactly the same reason code that code designed for ASCII "works" with UTF-8. If all the unknown sequences are copied unchanged from input to output then it "works" by any practical definition. Unfortunately a lot of people think that unless the program contains code to actively parse multi-code-unit characters, or even to go to the point that the program must apply some special meaning to a subset of those characters, then it somehow is "broken" for that encoding and "does not support it", but that is a totally useless definition as it has nothing to do with whether it will actually fail. Therefore I think it is fine to clearly say "Windows uses UTF-16".[[User:Spitzak|Spitzak]] ([[User talk:Spitzak|talk]]) 19:30, 28 September 2017 (UTC)
 
::It strikes me that UCS-2 is not an encoding for the entire Unicode code space, but only a subset. (Likewise for ASCII). As encodings of subsets, both of them are special in that they match their Unicode subset not only in order, but in numerical value of the character code. While the subset of Unicode covered by UCS-2 matches that of Unicode 1.1 in magnitude, the incompatible change in Hangul encoding in Unicode 2.0 means that UCS-2, if understood as matching the post 2.0 layout up to U+FFFF, is not a complete encoding of any version of Unicode. It seems to me, that distinction should be the basis for a reformulation that prioritizes encodings that cover all of Unicode. [[User:Ablaut490|Ablaut490]] ([[User talk:Ablaut490|talk]]) 00:25, 24 December 2018 (UTC)