Talk:Unicode/Archive 6: Difference between revisions

Content deleted Content added
SporkBot (talk | contribs)
m Replace or disable a template per TFD outcome; no change in content
SporkBot (talk | contribs)
m Replace or disable a template per TFD outcome; no change in content
Line 153:
: If you find reliable sources for criticism or even discussions for/against Unicode, feel free to add the material. However, criticism sections are not mandatory. There is none in the [[Oxygen]] article for example. --[[User:Mlewan|Mlewan]] ([[User talk:Mlewan|talk]]) 18:11, 26 September 2013 (UTC)
:: You are obviously joking. There must be sources, as files in Unicode format take twice as much size as ANSI ones, and you cannot use simple table lookup algorithms anymore. This information is just waiting for someone speaking English to make it public. [[Special:Contributions/178.49.18.203|178.49.18.203]] ([[User talk:178.49.18.203|talk]]) 11:38, 27 September 2013 (UTC)
:::I'm sorry, but you are mistaken. You are confusing scalar values with encodings. In Unicode, these are completely different entities. The UTF-8 byte value of {{#invoke:Unicode convert|getUTF8|10A05}} is identical to UTF-16 {{UTF-16#invoke:Unicode convert|getUTF16|10A05}}, which are both encodings of U+10A05. When you get down to things like Z - U+005A, the UTF-8 ends up as a single byte: {{#invoke:Unicode convert|getUTF8|005A}}, taking up exactly as much disk space as its ANSI encoding. The fact that it has a four digit scalar value is irrelevant to how much room it takes on disk. Stateful encodings like BOCU and SCSU can bring this efficiency in data storage to every script, and multi-script documents can actually end up with smaller file sizes than in legacy encodings. [[User:Vanisaac|Van]][[User talk:Vanisaac|Isaac]]<sub><small>[[WP:WikiProject Writing systems|WS]] [[WP:WikiProject Heraldry and vexillology|Vex]]</small></sub><sup style="margin-left:-7.0ex">[[Special:Contributions/Vanisaac|contribs]]</sup> 13:34, 27 September 2013 (UTC)
 
:::: Stateful encodings are not generally useful. On the other hand, the requirement to represent, let's say, letter А as 1040 instead of some sane value like 192, and implement complex algorithms to make the lookup over 2M characters' size tables possible. And the requirement to use complex algorithms for needs of obscure scripts. It is clearly a demarch to undermine software development in 2nd/3rd world countries, as 1st world ones can simply roundtrip that Unicode hassle with trivial solutions. For the first world, 1 character is always 1 byte, like it always was. [[Special:Contributions/178.49.18.203|178.49.18.203]] ([[User talk:178.49.18.203|talk]]) 11:55, 28 September 2013 (UTC)