Talk:Unicode/Archive 7: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:02, 20 March 2023 edit Philoserf (talk \| contribs) Extended confirmed users 79,276 edits →"Quivira (typeface)" listed at Redirects for discussion: archived using OneClickArchiver) ← Previous edit		Latest revision as of 12:16, 9 July 2025 edit undo Lowercase sigmabot III (talk \| contribs) Bots, Page movers 2,448,897 edits m Archiving 1 discussion(s) from Talk:Unicode) (bot
(12 intermediate revisions by 3 users not shown)
Line 493: [[File:Information.svg\|30px]] An editor has identified a potential problem with the redirect [[:Quivira (typeface)]] and has thus listed it [[Wikipedia:Redirects for discussion\|for discussion]]. This discussion will occur at [[Wikipedia:Redirects for discussion/Log/2022 March 15#Quivira (typeface)]] until a consensus is reached, and readers of this page are welcome to contribute to the discussion. <!-- from Template:RFDNote --> [[User:1234qwer1234qwer4\|1234 kb of .rar files]] ([[User talk:1234qwer1234qwer4\|is this dangerous?]]) 19:02, 15 March 2022 (UTC) == Requested move 16 September 2021 == <div class="boilerplate" style="background-color: #efe; margin: 0; padding: 0 10px 0 10px; border: 1px dotted #aaa;"><!-- Template:RM top --> :''The following is a closed discussion of a [[Wikipedia:Requested moves\|requested move]]. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a [[Wikipedia:move review\|move review]] after discussing it on the closer's talk page. No further edits should be made to this discussion. '' The result of the move request was: '''NOT MOVED''': The consensus is that the current name properly describes the contents of this article and is not ambiguous. <small>([[Wikipedia:Requested moves/Closing instructions#Non-admin closure\|non-admin closure]])</small> [[User:Spekkios\|Spekkios]] ([[User talk:Spekkios\|talk]]) 00:52, 25 September 2021 (UTC) ---- [[:Unicode]] → {{no redirect\|Unicode Standard}} – The term "Unicode" is ambiguous, and may be used to refer to the Unicode Standard, the Unicode Consortium, Unicode characters, Unicode-encoded text, or any number of things related to the implementation of the Unicode Standard or the processing of Unicode text. The Unicode Consortium actively discourages the use of the term "Unicode" as an isolated noun ("Always use “Unicode” as an adjective followed by an appropriate noun. Do not use “Unicode” alone as a noun" [https://www.unicode.org/policies/logo_policy.html Unicode Consortium Name and Trademark Usage Policy]), and states that "The Unicode® Standard" should be used in preference to simply "Unicode" (of course we do not use ® on Wikipedia per [[MOS:TMRULES]]). The subject of this article is specifically the Unicode Standard (the opening sentence should be "'''The Unicode Standard''' is an information technology standard for ..."), and not the general concept of "Unicode", so the article should be moved to '''[[Unicode Standard]]''', with [[Unicode]] left as a redirect to avoid having to rename thousands of wikilinks. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 16:30, 16 September 2021 (UTC) Agree. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 21:03, 16 September 2021 (UTC) [[WP:NOTAVOTE]]. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:41, 17 September 2021 (UTC) '''Oppose move.''' See [[WP:OFFICIALNAMES]]. We do not use a name simply because it is official, and the common name here is Unicode. '''[[User:Old Naval Rooftops\|<span style="color:#002244">O.N.R.</span>]]''' <sup>[[User talk:Old Naval Rooftops\|<span style="color:#002244">(talk)</span>]]</sup> 03:52, 17 September 2021 (UTC) This does not address the ambiguity issue. "Unicode" is commonly used to refer to the Unicode Consortium. Just one random example from a [https://www.bbc.co.uk/news/technology-57848226 BBC article]: "Rachel Murphy and Amy Wiegand sent sample artwork to Unicode as part of their plea for a drone emoji", "Rachel Murphy thinks Unicode is wrong to not include a drone emoji", "Unicode rejected their proposal", etc. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:33, 17 September 2021 (UTC) The article refers to the consortium as "the Unicode Consortium" on first reference. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:40, 17 September 2021 (UTC) *As Calidum says. And even in its isolated form here, there is no misunderstanding in what is intended: "sent to the Unicode Consortium". How could this be misread? This obviousness is present throughout the article. No ambivalence. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 11:39, 18 September 2021 (UTC) Generally '''oppose''' as per O.N.R. above. As far as I have ever seen, in general usage, "Unicode" as a bare noun is only used to refer to the standard. Even at UTC meetings with actual officers and members present, use of plain "Unicode" referred only to the standard, never the consortium, Unicode encoded text, characters, or anything else. However, I fully agree the article lede should begin with "The Unicode Standard" as the official name. [[User:Vanisaac\|Van]][[User talk:Vanisaac\|Isaac]], MPLL<sup> [[Special:Contributions/Vanisaac\|cont]]</sup><sub style="margin-left:-3.5ex"><small>[[WP:WPWR\|WpWS]]</small></sub> 04:47, 17 September 2021 (UTC) The article title should match the bolded term in the lede, so if you accept that the lede should start with "The Unicode Standard" in bold then you really have to accept that the article title should also be "[The] Unicode Standard". [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:33, 17 September 2021 (UTC) More the other way around: in general, the article title should reappear in bold in the [[WP:FIRSTSENTENCE\|first sentence]]; an alternative name can be added in bold (as is the case today [https://en.wikipedia.org/w/index.php?title=Unicode&diff=1044856772&oldid=1044753689&diffmode=source]). -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:44, 18 September 2021 (UTC) '''Oppose'''. We use common names, not official ones. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:40, 17 September 2021 (UTC) '''Support''' due to [[WP:PRECISE]], not due to official names. That rationale is badly flawed, yes, but the precision one is very relevant. [[User:Red Slash\|<span style="color:#FF4131;">Red</span>]] [[User talk:Red Slash\|<b><span style="color:#460121;">Slash</span></b>]] 19:08, 17 September 2021 (UTC) '''Oppose'''. First of all, possibble ambiguity is limited to "[the] Unicode Standard" and "Unicode Consortium"; other terms mentioned in the proposal (''Unicode characters, Unicode-encoded text, or any number of things ...'') do not appear as ambiguous terms. What is ambigu in "Unicode characters"? — instead it is fully self-explaining! When specification between ...Standard or ...Consortium characters be needed, one should do so in the text. Also, a name like "Unicode CLDR" is not shortened to "Unicode" ever, nor is any Unicode Technical Report name [https://www.unicode.org/reports/index.html#annexes], so these do not apply. :Second, Unicode themselves uses plain "Unicode" for the Standard throughout and consistently: see [https://www.unicode.org/main.html main TOC], [https://www.unicode.org/glossary/#U Glossary]. Except for self-referring situations, this leaves no misunderstanding (when self-referring could be confusing, one writes like "The Unicode Standard is maintained by Unicode Consortium"). No problem here. :On wikipedia: As others have noted, [[WP:OFFICIALNAMES]] applies. Also, per [[WP:DISAMBIGUATION]]: we can easily establish that "Unicode Standard" is the ''primary topic'' for "Unicode". From there, we can create article [[Unicode (disambiguation)]] (with two entrances then) and add hatnote {{tl\|about}} to this article. Also, per [[WP:COMMONNAME]], current title is preferred and acceptable. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:08, 18 September 2021 (UTC) * '''Oppose''' I disagree with {{tq\|The subject of this article is specifically the Unicode Standard}} - it is about a broad concept of Unicode characters, Unicode-encoded text, Unicode input systems ... basically anything other than the organization called [[Unicode Consortium]]. I'm not opposed to a new [[History of the Unicode Standard]] article which focuses specifically on information about the development of versions of the Unicode Standard. [[User:力]] (power~enwiki, [[User talk:力\|<span style="color:#FA0;font-family:courier">π</span>]], [[Special:Contributions/力\|<span style="font-family:courier">ν</span>]]) 17:30, 21 September 2021 (UTC) '''Oppose''' per DePiep and 力 (power~enwiki). Also, if we are to use it, I believe the Wikipedia [[MOS:CAPS\|guidelines for capitalization]] would indicate that "standard" should be in lowercase (regardless of whether the consortium uses lowercase or not). Wikipedia avoids unnecessary use of uppercase. —⁠ ⁠[[User:BarrelProof\|BarrelProof]] ([[User talk:BarrelProof\|talk]]) 00:21, 24 September 2021 (UTC) <div style="padding-left: 1.6em; font-style: italic; border-top: 1px solid #a2a9b1; margin: 0.5em 0; padding-top: 0.5em">The discussion above is closed. <b style="color: #FF0000;">Please do not modify it.</b> Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.</div><!-- from [[Template:Archive bottom]] --> </div><div style="clear:both;"></div> == Vulnerabilities == A security advisory has been recently released from two researchers, one from the University of Cambridge and the other from the same and from the University of Edinburgh, in which they assert that carefully crafted computer source code can be used to introduce vulnerabilities in apparently harmless programs. Some security groups (like the one for Rust language) are already taking measures and issuing their own security advisories. I think that is something that affects Unicode as source code is one of the main applications of the standard. What do ye think would be a good way to introduce that to the article? <ref>https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/</ref> <ref>https://www.trojansource.codes/trojan-source.pdf</ref> <ref>https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html</ref> [[User:Bruno Unna\|Bruno Unna]] ([[User talk:Bruno Unna\|talk]]) 12:02, 1 November 2021 (UTC) ::Looks like existing {{slink\|Unicode\|Issues}} is the place to go. Indeed, a true Unicode case ({{unichar\|202E\|RIGHT-TO-LEFT OVERRIDE}}). Also consider mentioning at [[Bidirectional text]]? -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:29, 1 November 2021 (UTC) {{reflist-talk}} سلام [[User:M.h.gholamii\|M.h.gholamii]] ([[User talk:M.h.gholamii\|talk]]) 19:24, 14 July 2022 (UTC) ko [[User:M.h.gholamii\|M.h.gholamii]] ([[User talk:M.h.gholamii\|talk]]) 19:24, 14 July 2022 (UTC) == question == I was designing text shapes for electrical symbols and electronic elements. I design them on the Unicode-encoded FontCreator program, but after exporting it and copying and pasting the symbol I designed into the phone programs, it does not work and appears in the form of a question mark, what is the solution? (Note this topic is important for articles development, I want to design different symbols for non-electrical shapes and not only in the field of electricity and I don't want them to be thumbnails but text). [[User:Mohmad Abdul sahib\|<span style="font-size:0.875em;color: #339900;text-shadow:silver 0.2em 0.2em 0.1em;">'''Mohmad Abdul sahib'''</span>]] <span style="font-size:0.875em;color: Purple;text-shadow:silver 0.1em 0.2em 0.2em;">'''[[User:Mohmad Abdul sahib\|talk☎]]'''</span> [[User talk:Mohmad Abdul sahib\|talk]] 18:15, 18 April 2022 (UTC) :Likely, FontCreator has the appropriate ''font'', containing the electric symbols. But the receiving programm does not. Looks like the font should have ([[Unicode block]]) [[Miscellaneous Technical]]. Requires downloadingf a certain font, but I cannot help any further. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 19:52, 14 July 2022 (UTC) == New Taskforce WikiProject Unicode? == A proposal is opened at [[Wikipedia_talk:WikiProject_Computing#Taskforce_WikiProject_Unicode_–_proposal\|WP:COMP § Taskforce WP Unicode –_proposal]]. Please take a look. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 09:35, 2 October 2022 (UTC) == Version 15 & Wikidata == I am adding new blocks & data to Wikidata now. Assuming no DAB needed here, the pages are: {{bulleted list \|[[Arabic Extended-C]] — [[Template:Unicode chart Arabic Extended-C]] → WD:{{Q\|Q113956924}} \|[[Devanagari Extended-A]] — [[Template:Unicode chart Devanagari Extended-A]] → WD:{{Q\|Q113956904}} \|[[Kawi (Unicode block)]] — [[Template:Unicode chart Kawi]] → WD:{{Q\|Q113956944}} \|[[Kaktovik Numerals (Unicode block)]] — [[Template:Unicode chart Kaktovik Numerals]] → WD:{{Q\|Q113956957}} \|[[Cyrillic Extended-D]] — [[Template:Unicode chart Cyrillic Extended-D]] → WD:{{Q\|Q113956962}} \|[[Nag Mundari]] — [[Template:Unicode chart Nag Mundari]] → WD:{{Q\|Q113956955}} \|[[CJK Unified Ideographs Extension H]] — [[Template:Unicode chart CJK Unified Ideographs Extension H]] → WD:{{Q\|Q113956966}} }} [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 16:10, 13 September 2022 (UTC) :QID added -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 16:33, 13 September 2022 (UTC) :more listing -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 18:02, 13 September 2022 (UTC) ::Not much time to complete this list, for me. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 18:12, 13 September 2022 (UTC) Note that, as far as I can see, only two content articles require the "(Unicode block)" DAB-specifier, because of name overlap. The other "X (Unicode block)" pages sould be redirects to their (unambiguously named) content Block article. See also {{tl\|Unicode blocks/overview}}. DePiep. {{Recent changes in Unicode}} By now, most 15.0 changes seem to be processed & updated. See REcent Changes for current edits history. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 11:31, 21 September 2022 (UTC) :As a list of version-15.0-changes needed or done, this list is incomplete. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 05:36, 24 October 2022 (UTC) == Code Points == The lead claims that there are currently 149 186 characters in the Standard. That's confusing! Is that actual characters or does it include unprintable code points? I know what a code point is, my point is that the lead shouldn't confuse code points with characters. (I also argue that a "control character" isn't 'really' a character, not a grapheme, but that's a fight for somewhere else.) Writing about Unicode without an early clear explanation of what a code point is, is -I think- awful pedagogy. In fact, I don't think code point - a fundamental aspect of Unicode - is even defined in the article!!!! Wow, just wow. I also would like someone to verify that Unicode has characters for color. I believe that's wrong/false/misleading. I am aware that certain emoji can be modified by a code point to change some of its color. As far as I know, this is only true with a very small set of code points, and a very very small set of colors (I don't actually know if the colors are well-defined, I'd expect so, but...). These aren't colors, but are color modifiers for those other code points. [[Special:Contributions/174.130.71.156\|174.130.71.156]] ([[User talk:174.130.71.156\|talk]]) 16:00, 13 December 2022 (UTC) :There are no color defining codes in Unicode but there are names of characters that specify a color if displayed on a color device. Searching the word color in the article shows some possibly confusing text about color but nothing outright wrong. :This article leaves a lot to be desired, if you wish to make changes, you should. It's a wiki after all. [[User:SchmuckyTheCat\|SchmuckyTheCat]] ([[User talk:SchmuckyTheCat\|talk]]) 05:43, 15 December 2022 (UTC) There are two Variation Selectors (U+FE0E and U+FE0F) which specify whether an Emoji should be ideally displayed in color or black and white, but other than that, there are no color specifications in Unicode. The term "character" and "code point" are specified in the Unicode Standard, and if you feel that the coverage here is inadequate in conveying the meaning of those terms, I absolutely encourage you to contribute content to better reflect their technical specification. For the record, any code point defined beyond "Not A Character" or "Reserved" is a "character". This means control characters and whitespace are all considered characters in Unicode, just like a letter in an alphabet, a Kanji with On and Kun readings, or a mathematical symbol. [[User:Vanisaac\|Van]][[User talk:Vanisaac\|Isaac]], GHTV<sup> [[Special:Contributions/Vanisaac\|cont]]</sup><sub style="margin-left:-3.5ex"><small>[[WP:WPWR\|WpWS]]</small></sub> 06:18, 15 December 2022 (UTC) == Lead is simply wrong. == The offending sentence is:"The Unicode standard defines three and several other encodings exist, all in practice [[Variable-width encoding\|variable-length encodings]]." (Sure, you could strain to interpret that to mean "all but UTF-32", but let's keep it clear. It clearly implies all encodings are variable length. Wikipedia's own article on UTF-32 says it is fixed length. (Because it only needs to use 21 of the 32 bits for Unicode code points, it is very inefficient (and rarely used, afaik). But rarely used is not the same as "doesn't exist", and "all are variable" clearly implies it doesn't exist. I'd have to look again, are there really 3 variable Unicode encodings? I can only think of UTF-8 and UTF-16. (and some others that afaik are not "defined" in the Unicode standard (like GB18030), or that are obsolete (like UTF-7).) Replace "all" with "all common encodings" or something similar, and mention UTF-32.[[Special:Contributions/174.130.71.156\|174.130.71.156]] ([[User talk:174.130.71.156\|talk]]) 11:43, 15 December 2022 (UTC) :I think the intended meaning of this was that even if ''code points'' are fixed-size, modern Unicode is effectively variable-width, as what the user thinks is a "character" sometimes needs multiple code points.[[User:Spitzak\|Spitzak]] ([[User talk:Spitzak\|talk]]) 16:40, 15 December 2022 (UTC) ::Yes, Unicode includes both [[combining character]]s and [[precomposed character]]s, e.g., <{{U+\|0061}} “a” latin small letter a> <{{U+\|0308}} "¨" combining diaeresis> is equivalent to <{{U+\|00E4}} "ä" latin small letter A with diaeresis>. Further, some glyphs exist at multiple code points for historical reasons. There is a discussion of cannonical forms in the Unicode standard. --[[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 21:57, 15 December 2022 (UTC) ::It seems odd to me to describe code points as "fixed size". They're just an abstract number. It's when you ''encode'' (or store) the code points that you get variable lengths, at least for UTF-8, UTF-EBCDIC, and UTF-16 as described in the article. I think combining characters are a red herring for this discussion. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 23:10, 15 December 2022 (UTC) :::The Unicode standard does restrict the number of code points, so describing them as as fixed length 21-bit or 32-bit data is reasonable. [[user:Spitzak\|Spitzak]] is referring to characters, which indeed are variable length, a separate issue from the length of an encoded code point that does deserve mention. --[[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 17:14, 16 December 2022 (UTC) == Inline mentioning == I object to the [https://en.wikipedia.org/w/index.php?title=Unicode&diff=prev&oldid=1151049361 reversal] by {{U\|Peter M. Brown}}, citing [[WP:ITALICTITLE]] inappropriately. I'd say that the name, a noun, should not be in italics. ITALICTITLE referst to the name of a ''work'', ie the work itself (play, periodic, book). However, the Unicode standard is a ''standard'', not a book &tc. not even it's publication. The Standard is abstraction: the set of rules. It is a proper noun full stop. Key is, the article title notes the subject: the standard not the book. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 17:04, 21 April 2023 (UTC) :{{ping\|Peter M. Brown}} -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 10:43, 23 April 2023 (UTC) == Why no section about missing graphemes? == I don't know if it would be manageable, but Unicode clearly does not have all commonly used symbols. A simple example is the very commonly used 'slash marks' used to count. Most reading this will be familiar with the sequence /, //, ///, ////, and <s>////</s> with the crossmark (strike-through) diagonal (top left to bottom right) rather than horizontal. (This is typical in the USA, I understand European convention is slightly different). I request the editors to consider the addition of a list of missing (but documented) symbols.[[Special:Contributions/40.142.183.146\|40.142.183.146]] ([[User talk:40.142.183.146\|talk]]) 11:49, 9 June 2023 (UTC) :Unicode's non-inclusion of tally marks is covered in {{slink\|Tally marks\|Unicode}}. I don't think it's a good idea to include it also in this article. That would open the door of listing every proposal that has not yet been accepted. [[User:Indefatigable\|Indefatigable]] ([[User talk:Indefatigable\|talk]]) 15:42, 9 June 2023 (UTC) :I also oppose this idea. The set of unencoded symbols is open-ended and may exceed the number of encoded symbols. There would also be no way to determine ''which'' unencoded symbols merit mention. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 16:01, 9 June 2023 (UTC) == Proposed new writing systems to be encoded into Unicode 16 == Unicode 16 is set to release in September 2024. I think the following (con)scripts definitely need to be encoded: Chữ Việt Trí - an alphabet invented by Tôn Thất Chương in 2012 for Vietnamese language. It's still nicer than Latin-based Quoc Ngu and needs wide recognition as the Shavian and Hangul did. * Add support for Quikscript. * Add extra missing runes from Baconsthrope and Sedgeford and Armanen runes * Possibly add something more. [[Special:Contributions/94.180.80.9\|94.180.80.9]] ([[User talk:94.180.80.9\|talk]]) 07:31, 9 July 2023 (UTC) :Take a look at Unicode's FAQ for [http://www.unicode.org/faq/char_proposal.html Submitting Successful Character and Script Proposals]. Wikipedia isn't affiliated with The Unicode Consortium so requests here won't be seen or acted upon by the people who can actually add characters/scripts to the Unicode Standard. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 14:39, 9 July 2023 (UTC)