Talk:Unicode/Archive 7: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 00:10, 24 February 2021 edit Lowercase sigmabot III (talk \| contribs) Bots, Page movers 2,449,404 edits m Archiving 1 discussion(s) from Talk:Unicode) (bot ← Previous edit		Latest revision as of 12:16, 9 July 2025 edit undo Lowercase sigmabot III (talk \| contribs) Bots, Page movers 2,449,404 edits m Archiving 1 discussion(s) from Talk:Unicode) (bot
(25 intermediate revisions by 4 users not shown)
Line 200: [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 19:06, 24 February 2019 (UTC) == What does [[MOS:ALLCAPS]] require in [[Unicode#Architecture_and_terminology\|Unicode § Architecture and terminology]]? == In full, the bullet point in [[MOS:ALLCAPS]] relevant to Unicode reads: :The names of [[Unicode]] code points are conventionally given in small caps (tip: enter the name in all caps into the template {{tlx\|sc2}}). Example: {{xt\|the character <code>⁓</code> (U+2053, {{sc2\|SWUNG DASH}})}}. This is only done when presenting tables of Unicode data, and when discussing code point names as such. Otherwise prefer unstyled, plain-English character names (whether they coincide with code point names or not): {{xt\|the hyphen and the en dash}}, not {{!xt\|the {{sc2\|HYPHEN-MINUS}} and the {{sc2\|EN DASH}}}}. The Unicode article currently contains the text: :For code points in the [[Basic Multilingual Plane]] (BMP), four digits are used (e.g., U+0058 for the character LATIN CAPITAL LETTER X) This is contrary to MOS, as the discussion contains the all-caps text LATIN CAPITAL LETTER but does <u>not</u> present a table and is <u>not</u> about code point names as such but rather about a standard way of designating code points, one that involves hexadecimal digits. I replaced this with text that does conform to MOS. [[User:LiliCharlie\|LiliCharlie]] has reverted it, restoring the nonconforming code. Though the associated edit summary correctly quotes MOS:ALLCAPS as saying "The names of Unicode code points are conventionally given in small caps", the convention in question is provided by the Unicode Standard and the MOS spells out a different convention to be followed in Wikipedia articles. With some clearly-specified exceptions, we are forbidden to use code-point names in the manner prescribed by the Unicode Standard. Rather, we are instructed to use plain-English character names ''whether they coincide with code point names or not''. Editors are welcome to improve on the phrase I used, "the character 'X' in English and related languages", perhaps referring to the Latin ancestry of the character, but such emendations should still conform to the MOS. [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 21:47, 24 February 2019 (UTC) : That's clearly wrong, or at best confusing, since the hyphen-minus and the hyphen are two totally different things. In any case, we should not bring in plain English character names because we're not talking about plain English characters. If necessary, I'm fine with removing the names altogether; they're not needed for the example.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 01:34, 25 February 2019 (UTC) ::Well, we can use plain English ''referring expressions'', can't we, even we don't call them "names"? And the reader would surely appreciate seeing glyphs to get some idea what we're talking about; these could be put in parentheses. How about the following? :::For code points in the [[Basic Multilingual Plane]] (BMP), four digits are used, e.g. U+00F7 for the division sign (÷); for code points outside the BMP, five or six digits are used as required, e.g. U+13254 for the [[Egyptian hieroglyph]] designating a winding wall ( [[File:Hiero O4.png\|text-bottom\|15px]] ). [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 19:49, 26 February 2019 (UTC) == Version 12.1: new Japanese era name (2019-05-01) == Version 12.1 adds {{unichar\|32FF\|QUARE ERA NAME REIWA\|nlink=Reiwa\|html=}} "to enable software to be rapidly updated to support the new Japanese era name in calendrical systems and date formatting. The new Japanese era name was officially announced on April 1, 2019, and is effective as of May 1, 2019." [https://www.unicode.org/versions/Unicode12.1.0/] -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 22:08, 9 July 2019 (UTC) == New: Unicode alias names and abbreviations == FYI: I have created article '''[[Unicode alias names and abbreviations]]'''. IMO it is very complete, both wrt formal aliases (the 5 reasons), and the informal ones (used by Unicode e.g. in charts, but not formalised/listed). Personally: I have been working on this a long time to get it right (in age not time spend ;-) ). Mainly to get a complete list of abbr's-in-unicode. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 00:57, 19 October 2019 (UTC) == What is a Unicode font? == ''Google search has already picked this up so lest skim readers be misled I have stricken out proposals that I have withdrawn.'' --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 11:24, 15 October 2019 (UTC) {{ping\|Peter M. Brown}}, {{ping\|LiberatorG}}, {{ping\|Drmccreedy}} I wrote <s>{{quote\|text=The term 'Unicode font' is used more specifically to categorise those fonts that have implementations for every character in the repertoire (or at least a large (65,535) subset of it.}}</s> I am very open to suggestions about how better to say this. First of all, I take it that my ''Unicode is not in principle concerned with fonts ''per se'', seeing them as implementation choices. Any given character may have many [[allograph]]s, from the more common bold, italic and base letterforms to complex decorative styles. '' is not disputed. A [[font]] (according to that article) is "was a particular size, weight and style of a [[typeface]] in hot-metal typesetting" and in modern terminology can be taken as synonymous with a typeface. So, it seems to me, a "Unicode font" is literally a contradiction in terms: Unicode is a database of numeric values associated with letterforms no matter how drawn, it is not a font or a typeface. The font expresses in vectors how an an artist (typographer) has chosen to draw a letterform associated with that number. But the industry has adopted the term "Unicode font" to mean a font that has at least most of the characters in the basic plane. Is there a better way to express that succinctly than my proposal? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 23:21, 11 October 2019 (UTC) :No, the "industry" (whatever that is) has not adopted the term "Unicode font" to mean a font that has at least most of the characters in the basic plane. A Unicode font is simply a computer font that has a [[Cmap (font)\|cmap]] table that maps glyph IDs to Unicode code points. The term "pan-Unicode" is sometimes used to refer to a Unicode font that covers a high proportion of Unicode characters, but given that TTF/OTF fonts have a maximum limit of 65,535 glyphs, and Unicode 12.1 defines 137,766 graphic characters, it is nigh on impossible for a font to cover all Unicode characters. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 23:32, 11 October 2019 (UTC) ::I had hoped that your reference to Cmap would point the way to an improved definition, but unfortunately it just takes us around in a circular argument. ''"defines the mapping of [[Character encoding\|character codes]] to the [[glyph#Typography\|glyph]] index values used in the font."''<ref>{{Cite web\|url=https://docs.microsoft.com/en-us/typography/opentype/spec/cmap\|title=cmap – Character To Glyph Index Mapping Table – Typography\|website=docs.microsoft.com}}</ref> I accept that my definition is uncited (though I have great difficulty accepting a font with less than 2000 glyphs as a "Unicode font" - just because it calls itself a Unicode font surely doesn't make it one. Is there an RS definition anywhere?--[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 23:41, 11 October 2019 (UTC) {{reflist\|talk}} :::It is not something as basic as a font that supports multi-byte representation (rather than single byte), is it? (8-but bytes)--[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 00:04, 12 October 2019 (UTC) ::: A Unicode font could mean a lot of things; why wouldn't a font that supports glyphs indexed by Unicode point be called a Unicode font? Certainly if we were talking about a script supported by fonts that put their glyphs over the ASCII range, a font that used instead Unicode code points could productively be called a Unicode font. I certainly wouldn't talk about number of glyphs; a GB 2312 font with over 7,000 characters is less of a Unicode font than a Latin-Greek-Cyrillic font that richly supports all features only Unicode offers.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 10:38, 12 October 2019 (UTC) ::::I think you've fallen into the same trap as I did, that it is not a "proper" Unicode font unless it has a substantial repertoire. That is a qualitative judgement rather than a functionality judgement. ''[I have accepted that my original text was incorrect, btw. But something like it needs to be said, the question now is what?]'' According to the (uncited!) statement that opens the [[Unicode fonts]] article, {{quote\|A '''Unicode font''' is a [[computer font]] that maps [[glyph]]s to [[Unicode character]]s (i.e. the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]).}} ::::So if that is literally correct, a font that has no more than the most basic ASCII character-set would qualify as a Unicode font if "the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]". Is this true? Why? Why not? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 10:51, 12 October 2019 (UTC) ::::: Before Unicode we used fonts with a maximum of <256 glyphs. When we mixed scripts we had one font for Latin, another one for Cyrillic, yet another one for Greek, one for Hebrew, etc. (The encodings used are now known as "[[Character encoding#Character sets, character maps and code pages\|legacy encodings]].") So different characters were mapped to the same code point, and formatting was essential for a text to make sense. (However my word processor sometimes "lost formatting," which resulted in gibberish, or ''[[mojibake]]''.) Unicode was revolutionary as formatting was no longer needed for texts to make sense (see article [[Plain text]]), and we actually called any font containing a table to map gylphs to Unicode code points a Unicode font. This even applied to Latin-only fonts, because it was not clear which script a font handled, and even if a font was known to be a "Latin font," the range above (<128-character) ASCII was still legacy-encoded, so non-basic Latin letters like the {{angbr\|é}} in {{angbr\|[[résumé]]}} were mapped to different code points by different software developers. <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 14:34, 12 October 2019 (UTC) ::::::Yes, I knew that, see [[ISO Latin-1]], [[ISO Latin 2]] etc. I'm afraid this all reminds me of the [[HD Ready]] scam. So let me be provocative and propose a new draft second sentence. For convenience, I'll open a new subsection. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 15:05, 12 October 2019 (UTC) ::::: Not necessarily substantial, but something that's uniquely Unicode. ::::: You're imagining that Unicode font means something clear and unambiguous. It doesn't. Certainly, a font for a script might be called a Unicode font if there were a previous tradition of non-Unicode-compatible fonts, even if it only covered a 7-bit charset. But Unicode font would generally only be a marketing term.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 05:53, 13 October 2019 (UTC) ::::::''But Unicode font would generally only be a marketing term.'' Precisely and that is what I believe that the article should say explicitly. ''[[Caveat emptor]]'' presupposes an informed emptor. As well as the article telling readers what Unicode is, it should also tell them what it is not. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:19, 13 October 2019 (UTC) ::::::: I don't see any need to change the current section at all. Saying what something is not is rarely very productive; Unicode is not a fox, it's not a box, it is not rain, it is not a train, etc.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 13:28, 13 October 2019 (UTC) ===Suggested draft second second sentence=== {{quote\|text=<s>For a font to be described legitimately as a "Unicode font", it is only required that the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]. There is no minimum number of characters that must be included in the font; some fonts have quite a small repertoire.</s>}} Better? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 15:05, 12 October 2019 (UTC) :That's my understanding but I'd like to see a citation to back it up. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 15:40, 12 October 2019 (UTC) :::Yes, absolutely, I agree, indeed I would have to say it is a prerequisite before it can be added but I've not yet found one. I've based it closely on the opening sentence of [[Unicode font]], but it is not cited there either. :::: I dont't think we should tell users what ''"a font to be described legitimately as a 'Unicode font'"'' is. The term "Unicode font" has been used in various senses, and it is not up to Wikipedia users to instruct other users which usage is the legitimate one, classifying any other usage as illegitimate. We should report what experts agree on, not define new standards of legitimacy. <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 22:39, 12 October 2019 (UTC) :::::That's what I was thinking but couldn't articulate. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 23:32, 12 October 2019 (UTC) :::::::Yes, I accept that. I guess "validly" has the same issue. What I am trying to put encyclopedialy is that any font, no matter how small its repertoire, qualifies as a unicode font if it meets the technical specification. But the need for citation increases if anything. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:12, 13 October 2019 (UTC) ::Version by version, an increasing number of characters are assigned code points in the Unicode Standard. On the proposed definition, a non-Unicode font, even an obsolete one supporting nothing in the BMP, can become a Unicode font without any reworking of the font. Is that an acceptable consequence? [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 16:30, 12 October 2019 (UTC) :::I share your sentiment but I can't see any reasonable basis to exclude them unless we find a citation that says yea or nay. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 22:05, 12 October 2019 (UTC) * Maybe we should move [[Unicode font]] to [[Unicode compliant font]]. Solves about everything. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:30, 19 October 2019 (UTC) ===Possible citations=== ;Bigelow and Holmes Is this acceptable as a suitable citation?: ''To call an incomplete font containing Unicode subsets a ‘Unicode’ font could be misleading, since some users could mistakenly assume that any font called ‘Unicode’ will contain a full set of 28,000 characters.''<ref>{{cite journal \|url = http://cajun.cs.nott.ac.uk/wiley/journals/epobetan/pdf/volume6/issue3/bigelow.pdf \| title = The design of a Unicode font \| journal = ELECTRONIC PUBLISHING \| volume = VOL. 6(3), 289–305 \| date = September 1993 \| page = 298 \|last1 = Bigelow \| first1=Charles \| last2 = Holmes \| first2 = Kris}}</ref> The problem is that the paper is about designing [[Lucida Sans Unicode]] and describes the authors' attempt to create a font that has a consistent style irrespective of alphabet. Other articles I have read are quite disparaging about that idea, saying that there are major cultural differences between writing systems. These writers promote the idea that it is better to have a font that is optimised for the language in which a document is written, that it doesn't matter if it is "incomplete". I conclude also that while a 'pan-Unicode' [deprecated term!] font might have been a credible objective with 28000 characters in 1993, surely it is no longer so? So this citation might fail NPOV even if it passes [[WP:VNT]]. ; Unicode Consortium FAQ Interestingly, the Consortium prefers the phrase "Unicode conformant font". ''A Unicode-conformant font can be defined as a font which contains a mapping from Unicode characters and that maps characters to glyphs in a way that is consistent with character semantics defined in the Unicode Standard.''<ref>{{cite web \| url= https://www.unicode.org/faq/font_keyboard.html \| title = Fonts and keyboards \| publisher = Unicode Consortium \| date = 28 June 2017 \| accessdate= 13 October 2019}}</ref> Note that the FAQ says nothing about comprehensiveness. (I propose to append this citation to the opening sentence of [[Unicode fonts]]). Further comments (and citations!) welcome. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 16:34, 13 October 2019 (UTC) {{reflist talk}} ===Second proposal for second sentence=== Another attempt: {{quote\|text=A font is "Unicode compliant" if the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]].<ref>{{cite web \| url= https://www.unicode.org/faq/font_keyboard.html \| title = Fonts and keyboards \| publisher = Unicode Consortium \| date = 28 June 2017 \| accessdate= 13 October 2019}}</ref> The standard does not specify a minimum number of characters that must be included in the font; some fonts have quite a small repertoire.}} {{reflist talk}} Note that FAQ uses the term "compliant", which has the intent of my earlier "legitimate" without the overtones. Better? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 11:24, 15 October 2019 (UTC) '''Support'''. Especially since this greatly evades any "Unicode font" suggestion. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 23:21, 18 October 2019 (UTC) '''Comment'''. The wikilinks are somewhat stange. We don't need a link to ''[[Unicode Standard]]'', as this is the article that deals with that topic. Our first link to ''[[code point]]'' currently seems to be in the ''Architecture and terminology'' section, though the term is used 12 times above that section. : The smallest Unicode compliant font of my private collection is ''[[Noto fonts\|Noto]] Sans [[Tagbanwa script\|Tagbanwa]]'' v. 2.000 which has 29 gylphs (including 5 control and zero-width characters, 2 spaces, 2 punctuation marks, and the dotted circle <code>U+25CC</code>) that can be accessed via 28 Unicode characters. That appears to be all that is needed to write the language of the homonymous ethnicity. Is it really that remarkable that some scripts require fewer glyphs and characters than a Unicode compliant font for English? In other words: Do we really need the second sentence? <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 01:58, 19 October 2019 (UTC) :::True, I did those [links] for clarity in this talk page, they would come out before going live. Good observation that it had been used earlier but not wlinked, I hadn't noticed that. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) ::I can agree removing the second sentence here. There is not requirement in this, so no need to suggest it. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:28, 19 October 2019 (UTC) :::If you look back at the earlier discussion, where some editors fell into the trap (led, I admit, by me but I see from the journal article I cited that it is a common error) of believing that a font cannot be a Unicode font unless it as many thousands of glyphs. I really do believe that we should say this but if the consensus is that it breaks the ''X is not a fox or a box'' rule, then I will have to concede. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) * '''Oppose'''. A font can map glyphs to Unicode code points, but map the wrong glyphs (e.g. map a "B" glyph to U+0041), in which case the font is not compliant or conformant to the Unicode Standard. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:20, 19 October 2019 (UTC) :: I'm sorry, I don't really understand what you are saying here, could you elaborate please? I believe that I have paraphrased the FAQ at Unicode.org: your challenge seems (to me!) to be saying that you could have a font that is compliant but not legitimate ''(sic!)''. Surely we don't have to get bogged down in the possibility that someone might design, let alone expect to sell, a font that complies with the letter of the standard but contravenes its spirit? Our purpose is to explain, not a write a criminal code. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) ==="Unicode compliant font"=== Enough. Per OP "what is a <s>Unicode font</s>" etcetera: that does not exist. OTOH, a ''Unicode compliant font'' is well defined. So that is what enwiki should say. The article (with page content) is [[Unicode compliant font]]. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 21:00, 19 October 2019 (UTC) :Per [[WP:Common name]], "Unicode font" is the term most widely seen. {{as of\|20 October 2019}}, Wikipedia doesn't even have an article called "Unicode compliant font". ''This'' article should refer readers to [[Unicode font]] (or alias) for more detailed information, but it needs at least two sentences to give a reason why they should do that. Which is what all of the above is about. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:39, 20 October 2019 (UTC) == Definitions == {{Discussion top\|reason="The discussion is over. You have anything to say, please add new section"}} : ''This section primarily discuss {{diff2\|926340136\|this edit}} made by {{user\|Peter M. Brown}}'' : ''The versions of the section being discussed (for comparison)'': {{anchor\|da_definitions_versions}} :* ''{{diff2\|925696815#Architecture_and_terminology\|version0}} — prior to the following edit'' :* ''{{diff2\|926286702#Architecture_and_terminology\|version1}} — by {{re\|Alexander_Davronov\|label1=<span><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}}'' :* ''{{diff2\|926340136#Architecture_and_terminology\|version2}} — by {{re\|Peter M. Brown}}'' : ''Please use {{t\|re}} to notify participants'' : ''Pease use <code><nowiki>{{re\|Alexander_Davronov\|label1=<span><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}}</nowiki></code> to notify {{re\|Alexander_Davronov\|label1=<span><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}} {{re\|Peter M. Brown}} I'm going to revert back the changes you've made recently and just wanted to know your objections. <br>{{re\|Peter M. Brown}} {{tq\|i=1\| ... to conform more closely to the Unicode Standard }} [[#da_definitions_versions\|Previous version (version1)]] repeated terms almost word by word. Now it doesn't. <br>{{re\|Peter M. Brown}} {{tq\|i=1\| ... Deleted the inclusion of a long quote in a reference}} It is much more convenient to have it directly inside the article. No reason to delete it. <br>{{re\|Peter M. Brown}} {{tq\|i=1\| ... Only one reference is needed per paragraph.}} Wiki doesn't impose restrictions on number of sources. If they are reliable removing them is generally bad idea. <br>{{re\|Peter M. Brown}} {{tq\|i=1\| ... "hex" is nonstandard }} There is some recommendations on it: [[MOS:RADIX]] I'm going to replace it to 16 or just add <code>0x</code> before the number. <span>[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 21:42, 16 November 2019 (UTC) :{{re\|Alexander_Davronov\|label1=<span style="font: 900 0.8em "Lato""><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}} I particularly agree with your third point. If they are non-redundant and reliable sources, having more than one source per claim is a good thing. [[User:BernardoSulzbach\|BernardoSulzbach]] ([[User talk:BernardoSulzbach\|talk]]) 16:18, 17 November 2019 (UTC) ::Could someone explain how this {{tlg\|re}} template is supposed to work? <span>[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] provided a list of objections to an edit of mine in a post yesterday, and I just now happened on it, being mildly curious about the new section in [[Talk:Unicode]]. I’m more than willing to respond and will try to do so by 00:00, 19 November (UTC), but there was no notice in my "alerts", on my user talk page, or in my emails. Am I expected to look at everything that appears on my Watchlist? [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 22:30, 17 November 2019 (UTC) :::{{ping\|Peter M. Brown}} It might have to do with your notification settings in [[Special:Preferences]]. [[User:BernardoSulzbach\|BernardoSulzbach]] ([[User talk:BernardoSulzbach\|talk]]) 01:23, 18 November 2019 (UTC) ::::'''Many thanks!''' I didn't know about that page. I've now checked "Notify me when someone links to my user page" and "email". That should work. [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 03:21, 18 November 2019 (UTC) ::Taking up the objections in turn: ::''The previous version repeated terms almost word by word. Now it doesn't.'' :::Almost, yes, but the differences are important. According to the previous version, ::::Unicode defines a ''codespace'' – set of numbers/integers used to encode characters in the range of 0 to 10FFFF<sub>hex</sub>. :::The reference is to Unicode's "[https://unicode.org/glossary/#C Glossary]", according to which a codespace is "A range of numerical values available for encoding characters." The text in my revised version is word-for-word identical. The "Glossary" does go on to note that <u>for the Unicode Standard</u> the range is 0 to 10FFFF<sub>16</sub>. The previous version of the Wikipedia subsection omits this qualification, incorrectly implying that in <u>general</u> a codespace, regardless of the encoding standard, has this as a range. In the next sentence of my revised version, I do specify the range for Unicode. ::''It is much more convenient to have it [the long quote incorporated in a footnote] directly inside the article. No reason to delete it.'' :::The quote in question was not directly inside the article but rather in a footnote. According to [[WP:FOOTNOTES]], footnotes have two purposes: documenting an article's sources and providing tangential information. For the first purpose, the citation is sufficient without the quotation. For the second, the information in the quote is mostly not "tangential" as it repeats information in the main text. :::The only information in the footnote that is not in the main text is a definition of an "encoded character". If this phrase needs to be defined, which I doubt, putting the definition in the [[Unicode#Architecture_and_terminology\|Architecture and terminology]] section is too late, as the phrase has already been used three times, including a use in the lead section. I agree with the author of the lead that the average reader does not need a definition. Anyhow, the article often refers to code points, text, and scripts, rather than characters, as being encoded. ::''Wiki doesn't impose restrictions on number of sources. If they are reliable, removing them is a bad idea.'' :::Agreed. The paragraph has two sources, both from the Unicode Standard, the "Glossary" and the section "[http://www.unicode.org/versions/Unicode12.0.0/ch02.pdf#G25564 Code Points and Characters]". This remains true. ::Response to "''hex'' is nonstandard." :::This is uncontroversial. I have replaced "10FFFF<sub>hex</sub>" with "[[hexadecimal]] 10FFFF", which will be clear to many readers. Other readers can follow the link to [[hexadecimal]]. According to [[MOS:RADIX]], the use of subscripts for numerals not in base 10 is limited to articles that are not computer oriented. If an editor uses prefixes such as {{code\|0x}} then, per the same [[MOS]] subsection, the editor must "Explain these prefixes in the article's introduction or on first use." In any case, the previous version of the Wikipedia article requires modification. ::[[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 19:42, 18 November 2019 (UTC) :::: {{re\|Peter M. Brown}} Sorry for a belated response. I've added [[#da_definitions_versions\|three versions]] of the section being discussed so we may compare'em easily. :::: {{tq\|i=1\|[...] incorrectly implying that in general a codespace, regardless of the encoding standard [...]}} It explicitly refers to the standard and glossary of Unicode so No, it doesn't justify changes. My [[#da_definitions_versions\|version(1)]] of '''codebase''' of definition was shorter. Do you agree to replace words ''«set of of numbers/integers»'' by ''«range of numbers»'' in my version and leave it in place? As well as definition of ''code points''? :::: {{tq\|i=1\|[...] citation is sufficient without the quotation. [...] as it repeats information in the main text. [...]}} This part is elaborated more precisely by [[WP:CS#Additional_annotation\|WP:CS]] and [[WP:CLOP]], not [[WP:FOOTNOTES]]. The quotation seems to me advantageous since it covers all three definitions and may be placed at three different places simultaneously or at the end and, as I said, it's quick to access. If you insist that the quotation is excessive I would concede. :::: {{tq\|i=1\|[...] putting the definition ... is too late, as the phrase has already been used three times [...]}} {{anchor\|defintions_a_d_23_11_19-3}} This is unreasonable & subjective. It's never late. It's should be given for the sake of clarity. I insist to return it back as reliable source is given. :::: {{tq\|i=1\|[...] This remains true. [...]}} Let's return it back by the end of conversation. :::: {{tq\|i=1\|[...] previous version of the article article requires modification }} I suggest to put it this way: ''«0<sub>[[Hexadecimal\|16]]</sub> to 10FFFF<sub>[[Hexadecimal\|16]]</sub>»''. Any thoughts? <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 21:59, 23 November 2019 (UTC) :::::{{re\|Alexander_Davronov\|label1=<span><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}} :::::I assume that you're referring to a code'''space''', though you wrote "code'''base'''". :::::I have no preference whatever between "set of of numbers/integers" and "range of numbers". What is at issue is substantive fidelity to the Unicode [https://unicode.org/glossary/#C Glossary], which has two entries: :::::::(a) ''A range of numerical values available for encoding characters'' :::::::(b) ''For the Unicode Standard, a range of integers from 0 to 10FFFF<sub>16</sub> ::::::Your version 1 has :::::::(c) ''set of of numbers/integers used to encode characters in the range of 0<sub>hex</sub> to 10FFFF<sub>hex</sub>'' ::::::Why does the Glossary have two definitions? Because (a) is a ''general'' definition covering Unicode, [[ASCII]], [[EBCDIC]], {{nowrap\|[[GB 18030]]}}, etc. etc. Each of these has a different range of numbers and hence a different codespace. In ASCII, for example, it's 0 to 255. And in Unicode? The answer is provided in (b): for ''this particular encoding'', the range is 0 to 10FFFF<sub>16</sub> i.e., in decimal, 0 to 1114111. (c) provides a general statement, at odds with (a): with no specification of the encoding, the range is said to be 0 to 10FFFF<sub>16</sub>. Of course (c) is shorter than (a)+(b), because it does less: (a)+(b) provides ''both'' a general definition of a codespace and a specification of the codespace for Unicode. :::::''[[WP:FOOTNOTES]] vs. [[WP:CS]]'': You're right, [[WP:CS]] is an official guideline and [[WP:FOOTNOTES]], which flatly contradicts it, is not. I get impatient when I follow a footnote and find the text repeated; "I've already <u>read</u> that," I think. But rules are rules, even when I don't like them. :::::''...the phrase has already been used ... in the lead''. I think this is a legitimate complaint. Unless following a section link, everybody reads the lead first, so, if a locution is obscure enough to require a definition at all, it shouldn't be undefined in the lead. Put it back in [[Unicode#Architecture_and_terminology\|Architecture and terminology]] if you must, but also define this putatively obscure locution in the lead. :::::'' Let's return it back by the end of conversation.'' Hunh? Return what back where by when? :::::''0<sub>16</sub> to 10FFFF<sub>16</sub>'' The text in the Glossary has "0 to 10FFFF<sub>16</sub>", as subscripting 0 is not called for; zero is zero. My text "0 to [[hexadecimal]] 10FFFF" provides a link for a person unfamiliar with "hexadecimal" and with the subscript notation, but perhaps no such person would be reading this article. Go with the subscript if you feel strongly about it. :::::[[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 01:20, 24 November 2019 (UTC) :::::: {{re\|Peter M. Brown}} :::::: {{tq\|i=1\|[...] I assume that you're referring to a code'''space''' [...]}} Yea, I'm referring to a '''codespace''' of course. It was a mistake. :::::: {{tq\|i=1\|[...] Why does the Glossary have two definitions? Because (a) is a general definition covering [...]}} I think I got your lengthy explanation. I just discovered that more precise definition of ''codesppace'' exists<ref name="Unicode_Standard_12.0" /> so I suggest to use the following version (I will remove quotations): {{quote \|title= {{anchor\|definitions_draft_1}} Draft 1 \|text= Unicode defines a ''unicode codespace''<ref group="note">In the article it is referred simply as ''codespace''.</ref> – a range of integers from 0 to 10FFFF<sub>[[Hexadecimal\|16]]</sub>.<ref name="Glossary" /><ref name=":0" /><ref name="Unicode_Standard_12.0" /> Any value in the codespace is called a [[code point]]. Not all code points are assigned to encoded characters.<ref name="Glossary" /> }} :::::: {{tq\|i=1\|[...] Hunh? Return what back where by when? [...]}} I'm going to return back citation you {{diff2\|926340136\|have removed}} once we come to a consensus over definitions' shape. :::::: {{tq\|i=1\|[...] Go with the subscript if you feel strongly about it. [...]}} We also may utilize [[WP:REFGROUP\|<code><nowiki><ref group="...">...</nowiki></code>]] but I think subscriptions with ''linked type of'' numbers is the best choice so let's got with it. <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 12:49, 24 November 2019 (UTC) :::{{re\|Alexander_Davronov\|label1=<span><span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span></span>}} :::I think that we are agreed. to summarize: the Unicode Standard<ref name="Unicode_Standard_12.0" />, as you have noted, characterizes the <u>Unicode</u> codespace as ::::A range of integers from 0 to 10FFFF<sub>16</sub>. :::The sentence I objected to read ::::Unicode defines a codespace – set of numbers/integers used to encode characters in the range of 0 to 10FFFF<sub>hex</sub>. :::These are not the same. The first is ''explicity'' a characterization of a <u>Unicode</u> codespace. The second is a general characterization of a codespace and it errs because other encodings, [[ASCII]] for example, have different codespaces. Your proposal to replace the wording with "Unicode defines a ''unicode codespace''..." will correct matters. Unicode nowhere defines a codespace in such a way as to exclude other ranges for other encodings, but that's just what the sentence I reverted claims that Unicode does. :::Since you have not responded to my point that a definition of "encoded character" should not appear in the section [[Unicode#Architecture_and_terminology\|Architecture and terminology]] unless in appears in the lead, I take it that you agree and will alter the lead so that it either does not use the locution "encoded Unicode character" or else uses the locution along with a definition. :::[[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 21:30, 24 November 2019 (UTC) ::::I just ran across [[MOS:NOTES]]. Don't {{brackets\|MOS:}} sections have priority over {{brackets\|WP:}} ones, hence [[MOS:NOTES]] supersedes [[WP:CS]]? Since the long quote I deleted falls into none of the four categories allowed under [[MOS:NOTES]], it seems that my deletion was in order. [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 16:19, 27 November 2019 (UTC) :::: {{re\|Peter M. Brown}} :::: So do you have any objections regarding my [[#definitions_draft_1\|draft proposed here]]? If so, let me know. We need agreement to proceed. :::: {{tq\|i=1\|[...] Since you have not responded to my point that [...]}} I've answered it [[#defintions_a_d_23_11_19-3\|here]]. :::: {{tq\|i=1\|[...] I take it that you agree [...]}} Do not take anything as agreement until I explicitly express it. Addition of definition of ''encoded character'' wouldn't decrease article's quality I'm sure. :::: {{tq\|i=1\|[...] Don't {{brackets\|MOS:}} sections have priority over [...]}} It depends on whether it's a policy or guideline. Both (NOTES & CS) are guidelines and I consider them equal. [[MOS:NOTES]] doesn't override [[WP:CS]] cause they govern different parts of the article: appearance and structure of footnotes ([[MOS:NOTES]]) and its content ([[WP:CS]]) respectively. <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 21:06, 27 November 2019 (UTC) :::{{Wrong venue\|[[#Definitions 2]]\|2=<span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 16:27, 7 December 2019 (UTC)}} === Definitions 2 === {{Moved discussion from\|[[#Definitions\|#Definitions]]\|2=<span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 16:27, 7 December 2019 (UTC)}} : ''The changes being discussed:'' : ''{{diff2\| 925696815#Architecture_and_terminology\|Revision 0}} — Prior revision :* ''{{diff2\| 929206134#Architecture_and_terminology\|Revision 1}} — by {{user\|Alexander Davronov}}}'' :* ''{{diff2\| 929307700#Architecture_and_terminology\|Revision 2}} — by {{user\|Peter M. Brown}}'' {{re\|Peter M. Brown}} I suggest to return back to old definition of codespace with additional sourcing. After this discussion I started to think that it's more concise and accurate. Any objections? <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 22:51, 5 December 2019 (UTC) {{quote \|title={{anchor\|definitions_draft_2}}New proposal based on {{diff2\|925696815#Architecture_and_terminology\|version0}} \|text=Unicode defines a codespace of 1,114,112 [[code point]]s in the range 0 to 10FFFF<sub>[[Hexadecimal\|16]]</sub>.<ref name="Glossary" /><ref name=":0" /><ref name="Unicode_Standard_12.0" /> }} :As it stands, the first sentence of the section contains, ''literally'', [https://unicode.org/glossary/#C the definition of "codespace" in the Glossary]. One cannot be any truer to the sources than that. Now that the reader has got that far, it is necessary to be more specific as to what Unicode's codespace is. The second sentence does this, paraphrasing the second sentence in the glossary; I am uncomfortable with actually ''using'' that second sentence, which starts "For the Unicode Standard ...", since the phrase "Unicode Standard" appears as a proper name (with a capital 'S', no less), and the reader has not been introduced to this usage. That done, proceeding step by step, the third sentence explains the phrase "code point". :Do I understand you as proposing to ''start out'' with a sentence using "code point" without defining it? If so, I disagree. It is linked but, per [[MOS:LINKSTYLE]], "as far as possible do not force a reader to use [a] link to understand the sentence." {{diff2\|925696815#Architecture_and_terminology\|Version0}} is even worse, introducing both "code point" and "codespace" without definitions. Admittedly, this is also done in the lead; I maintain that this also needs correction, but one thing at a time. :[[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 00:50, 6 December 2019 (UTC) :: {{re\|Peter M. Brown}} {{anchor\|definitions_22_41_6_December_2019}} Ok, let's leave definition of codespace unchanged. :: I was going to ask you to amend your sentence added {{oldid2\|929390586\|by this edit}}: ''«Not all of these 1,114,112 code points are available for encoding visible characters»''; amount of code points isn't mentioned before so word ''these'' is unexpected here. <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 22:41, 6 December 2019 (UTC) :::Suppose we change the second through fourth sentences to ::::For Unicode, the relevant codespace consists of 1,114,112 numbers, all the integers from 0 to 10FFFF<sub>[[Hexadecimal\|16]]</sub>. Each of these is called a ''[[code point]]''. Not all of them are available for encoding visible characters; some, for example, are assigned to control codes like the [[carriage return]]. :::OK? You handle the <ref>s, please—I'm likely to mess up again. :::[[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 23:48, 6 December 2019 (UTC) :::: {{re\|Peter M. Brown}} It's much better for ''second'' through ''forth'' parts. :::: {{tq\|i=1\|The second sentence does this, paraphrasing the second sentence in the glossary [...]}} Well I have to revoke my previous [[#definitions_22_41_6_December_2019\|agreement over here]]: the current definition of codespace is clunky once again. It's unnecessary to cite general definition ("characterization") as it's obvious what codespace means regardless of type of encoding. I've opened an [[#RfC:_Which_version_you_like_the_most?\|RfC]] to see whose point prevails over definitions of both code space and code points. <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 16:27, 7 December 2019 (UTC) {{reflist-talk\|group=note}} {{reflist-talk\|refs= <ref name="Unicode_Standard_12.0">{{cite web \|title=The Unicode Standard, Version 12.0 \|url=http://www.unicode.org/versions/Unicode12.1.0/ch03.pdf#G2212\|page=19\|quote=Unicode codespace: A range of integers from 0 to 10FFFF16.<br>• This particular range is defined for the codespace in the Unicode Standard. <br>Other character encoding standards may use other codespaces.}}</ref> <ref name="Glossary">{{cite web\|title = Glossary of Unicode Terms\|url=https://unicode.org/glossary/\|accessdate=2010-03-16}}</ref> <ref name=":0">{{Cite book\|url=http://www.unicode.org/versions/Unicode12.1.0/ch02.pdf#G13708\|title=The Unicode® Standard Version 12.0 – Core Specification\|last=\|first=\|publisher=\|year=2019\|isbn=\|___location=\|pages=29\|chapter=2.4 Code Points and Characters\|quote=The range of integers used to code the abstract characters is called the codespace. A particular integer in this set is called a code point. When an abstract character is mapped or assigned to a particular code point in the codespace, it is then referred to as an encoded character.}}</ref> }} {{Reflist}} === RfC: Which version you like the most? === <div class="boilerplate archived" style="background-color: #EDEAFF; padding: 0px 10px 0px 10px; border: 1px solid #8779DD;">{{Quote box \| title = \| title_bg = #C3C3C3 \| title_fnt = #000 \| quote = There is a clear consensus for {{diff2\|925696815#Architecture_and_terminology\|Version 0}}.<p>[[User:Cunard\|Cunard]] ([[User talk:Cunard\|talk]]) 10:29, 26 January 2020 (UTC) \| width = 30%\|halign=left}} :''The following discussion is closed. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.''<!-- from Template:Archive top--> ---- This RfS primarily concerns terms and definitions of [[Unicode]] standard. Which version of definitions of ''«codespace»'' and ''«code point»'' you like the most?: :* ''{{diff2\| 925696815#Architecture_and_terminology\|Version 0}}'' :* ''{{diff2\| 929206134#Architecture_and_terminology\|Version 1}}'' :* ''{{diff2\| 929451893#Architecture_and_terminology\|Version 2}}'' Please, take a note that counting starts from zero and revisions are listed in chronological order. The discussion may be found here: [[#Definitions 2]]. Any of three versions going to have at least 3 sources. <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 16:27, 7 December 2019 (UTC) : Version 0 please.[[User:Spitzak\|Spitzak]] ([[User talk:Spitzak\|talk]]) 16:42, 7 December 2019 (UTC) : Version 0. Concise, easy to understand, and not loaded with redundant references. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 23:06, 7 December 2019 (UTC) :: {{re\|BabelStone}} Why do you think they are redundant? <span style="" >[[User:Alexander_Davronov\|<span style='color:#a8a8a8'>DAVRONOV</span><span style="color:#000">A.A.</span>]] [[User talk:Alexander_Davronov\|✉]] [[Special:Contributions/Alexander_Davronov\|⚑]]</span> 17:11, 9 December 2019 (UTC) : Version 0. [[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 21:08, 16 January 2020 (UTC) {{Discussion bottom}} ---- : ''The discussion above is closed. <b style="color: #FF0000;">Please do not modify it.</b> Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.''<!-- from [[Template:Archive bottom]] --></div><div style="clear:both;"></div> == Suggested article improvements == I feel like this article should be made more concise, and that maybe much of the material (''e. g.'', the lengthy discussions of encoding schemes scattered throughout the article, particularly older schemes such as UCS-2 and UCS-4) might better be factored out into the corresponding, dedicated articles and replaced here by “See…” references to those articles. Your thoughts? —[[User:PowerPCG5\|PowerPCG5]] ([[User talk:PowerPCG5\|talk]]) 02:48, 22 February 2020 (UTC) :If anything, it's the transforms that should be moved to a separate article. UCS-2 (essentially, the [[#Basic Multilingual Plane]]) and UCS-4 are part and parcel of Unicode. [[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 02:11, 23 February 2020 (UTC) == Category merge proposed == I have proposed to merge all version-specific subcategories like {{cl\|Scripts encoded in Unicode 13.0}} into {{cl\|Scripts encoded in Unicode}}. Discussion is [[Wikipedia:Categories_for_discussion/Log/2020_March_20#Scripts_encoded_in_Unicode_1.0\|here]]. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 22:13, 20 March 2020 (UTC) == Special characters in article == A recent edit by [[User:Beland]] removed registered trademark symbols (®) and replaced various named character attributes, e.g., &thinsp;, with the actual characters or with ASCII quotation marks. Were there Wikipedia policies requiring that? Intuitively, it would seem that it would be easier to edit with named attributes and that the registered symbol was legally required. [[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 12:35, 7 May 2020 (UTC) :The ® is definitely not legally required, and is proscribed against by [[Wikipedia:Manual of Style/Trademarks]]. Conversion of "…" to "..." is required by [[MOS:ELLIPSIS]]. ASCII quote marks are required by [[MOS:STRAIGHT]]. [[MOS:MARKUP]] says in general markup should be kept as simple as possible; since it's usually unnecessary, I've generally been dropping &thinsp; or (as in this case) converting it to a regular ASCII space. It's generally agreed ([[MOS:NBSP]]) that when thin spaces are used, a named reference of some kind is preferred over the character itself, since it's difficult to tell apart from other whitespace characters. Adding space around special characters is one of the cases where thin spaces are explicitly allowed; if you prefer them over regular ASCII spaces in this case (or no space), feel free to restore them. I recommend using {{tl\|thinsp}}, since this is ignored by the automated scan I was using. -- [[User:Beland\|Beland]] ([[User talk:Beland\|talk]]) 15:04, 7 May 2020 (UTC) :The ® were removed from the titles of referenced documents. I thought the titles of references were to be kept unchanged as much as possible. :I also agree that replacing character references with invisible characters is a bad idea.[[User:Spitzak\|Spitzak]] ([[User talk:Spitzak\|talk]]) 21:13, 7 May 2020 (UTC) ::About the ®: [[MOS:TM]], already referred to, describes that we should use the ''independent sources'' style. On top of this, Unicode themselves prefer to omit the ® symbol: see [https://www.unicode.org/versions/Unicode13.0.0/ Version references]. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 21:33, 7 May 2020 (UTC) ::About the ellipsis, the cited [[MOS:ELLIPSIS]] calls for the use of a a non-breaking space before an ellipsis. [[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 21:45, 7 May 2020 (UTC) :::My interpretation of [[MOS:TM]] is that reliable, independent sources can be used to determine styling (like Ipad vs. iPad) but that ® and ™ are to be avoided except as needed for disambiguation. (Though sources independent of the trademark holder rarely use the trademark symbols.) [[MOS:CONFORMTITLE]] says that titles of works should be altered to conform to Wikipedia house style. -- [[User:Beland\|Beland]] ([[User talk:Beland\|talk]]) 12:12, 12 May 2020 (UTC) == Number of valid characters == Article reports 143,859 valid characters ; a short python script that runs chr(value) with all possible values of length 1-4 bytes will report 2,294,016 valid characters (and 4,309,516,288 (!) invalid characters) in the byterange. How come there is a factor ~20 between the two data? <!-- Template:Unsigned IP --><small class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/85.0.37.33\|85.0.37.33]] ([[User talk:85.0.37.33#top\|talk]]) 18:44, 25 May 2020 (UTC)</small> <!--Autosigned by SineBot--> :No, that is not what the article reports. The article does have the text "there is a repertoire of 143,859 characters,"; note that the text neither uses the term valid nor refers to strings of 1-4 octets; it refers to characters that have been assigned code points in the range 0000–10FFFF by the [[Unicode Consortium]]. The only text that uses the term ''valid'' is distinguishing surrogate pairs from other code points. [[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 20:23, 25 May 2020 (UTC) == "유니코드" listed at [[Wikipedia:Redirects for discussion\|Redirects for discussion]] == [[File:Information.svg\|30px]] A discussion is taking place to address the redirect [[:유니코드]]. The discussion will occur at [[Wikipedia:Redirects for discussion/Log/2021 January 1#유니코드]] until a consensus is reached, and readers of this page are welcome to contribute to the discussion. <!-- from Template:RFDNote --> [[User:Dominicmgm\|Dominicmgm]] ([[User talk:Dominicmgm\|talk]]) 23:35, 1 January 2021 (UTC) {{Clear}} == History == Given what Unicode is, an important part of its history is its adoption by word processors (Word, OpenOffice, but notably not WordPerfect) and operating systems (Windows, Linux, ...) and fonts (TTF). As a practical matter for the end user, it didn't become available in 1988, but when they could use it for their documents (I think for most people this meant 1997). <!-- Template:Unsigned IP --><small class="autosigned">— Preceding [[Wikipedia:Signatures\|unsigned]] comment added by [[Special:Contributions/77.61.180.106\|77.61.180.106]] ([[User talk:77.61.180.106#top\|talk]]) 00:29, 13 October 2021 (UTC)</small> <!--Autosigned by SineBot--> {{Clear}} == Infobox Unicode block: add a 'related' list? == See discussion at {{slink\|Template_talk:Infobox_Unicode_block\|Related_blocks}}. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 09:43, 27 February 2022 (UTC) {{Clear}} == "Quivira (typeface)" listed at [[Wikipedia:Redirects for discussion\|Redirects for discussion]] == [[File:Information.svg\|30px]] An editor has identified a potential problem with the redirect [[:Quivira (typeface)]] and has thus listed it [[Wikipedia:Redirects for discussion\|for discussion]]. This discussion will occur at [[Wikipedia:Redirects for discussion/Log/2022 March 15#Quivira (typeface)]] until a consensus is reached, and readers of this page are welcome to contribute to the discussion. <!-- from Template:RFDNote --> [[User:1234qwer1234qwer4\|1234 kb of .rar files]] ([[User talk:1234qwer1234qwer4\|is this dangerous?]]) 19:02, 15 March 2022 (UTC) == Requested move 16 September 2021 == <div class="boilerplate" style="background-color: #efe; margin: 0; padding: 0 10px 0 10px; border: 1px dotted #aaa;"><!-- Template:RM top --> :''The following is a closed discussion of a [[Wikipedia:Requested moves\|requested move]]. <span style="color:red">'''Please do not modify it.'''</span> Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a [[Wikipedia:move review\|move review]] after discussing it on the closer's talk page. No further edits should be made to this discussion. '' The result of the move request was: '''NOT MOVED''': The consensus is that the current name properly describes the contents of this article and is not ambiguous. <small>([[Wikipedia:Requested moves/Closing instructions#Non-admin closure\|non-admin closure]])</small> [[User:Spekkios\|Spekkios]] ([[User talk:Spekkios\|talk]]) 00:52, 25 September 2021 (UTC) ---- [[:Unicode]] → {{no redirect\|Unicode Standard}} – The term "Unicode" is ambiguous, and may be used to refer to the Unicode Standard, the Unicode Consortium, Unicode characters, Unicode-encoded text, or any number of things related to the implementation of the Unicode Standard or the processing of Unicode text. The Unicode Consortium actively discourages the use of the term "Unicode" as an isolated noun ("Always use “Unicode” as an adjective followed by an appropriate noun. Do not use “Unicode” alone as a noun" [https://www.unicode.org/policies/logo_policy.html Unicode Consortium Name and Trademark Usage Policy]), and states that "The Unicode® Standard" should be used in preference to simply "Unicode" (of course we do not use ® on Wikipedia per [[MOS:TMRULES]]). The subject of this article is specifically the Unicode Standard (the opening sentence should be "'''The Unicode Standard''' is an information technology standard for ..."), and not the general concept of "Unicode", so the article should be moved to '''[[Unicode Standard]]''', with [[Unicode]] left as a redirect to avoid having to rename thousands of wikilinks. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 16:30, 16 September 2021 (UTC) Agree. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 21:03, 16 September 2021 (UTC) [[WP:NOTAVOTE]]. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:41, 17 September 2021 (UTC) '''Oppose move.''' See [[WP:OFFICIALNAMES]]. We do not use a name simply because it is official, and the common name here is Unicode. '''[[User:Old Naval Rooftops\|<span style="color:#002244">O.N.R.</span>]]''' <sup>[[User talk:Old Naval Rooftops\|<span style="color:#002244">(talk)</span>]]</sup> 03:52, 17 September 2021 (UTC) This does not address the ambiguity issue. "Unicode" is commonly used to refer to the Unicode Consortium. Just one random example from a [https://www.bbc.co.uk/news/technology-57848226 BBC article]: "Rachel Murphy and Amy Wiegand sent sample artwork to Unicode as part of their plea for a drone emoji", "Rachel Murphy thinks Unicode is wrong to not include a drone emoji", "Unicode rejected their proposal", etc. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:33, 17 September 2021 (UTC) The article refers to the consortium as "the Unicode Consortium" on first reference. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:40, 17 September 2021 (UTC) *As Calidum says. And even in its isolated form here, there is no misunderstanding in what is intended: "sent to the Unicode Consortium". How could this be misread? This obviousness is present throughout the article. No ambivalence. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 11:39, 18 September 2021 (UTC) Generally '''oppose''' as per O.N.R. above. As far as I have ever seen, in general usage, "Unicode" as a bare noun is only used to refer to the standard. Even at UTC meetings with actual officers and members present, use of plain "Unicode" referred only to the standard, never the consortium, Unicode encoded text, characters, or anything else. However, I fully agree the article lede should begin with "The Unicode Standard" as the official name. [[User:Vanisaac\|Van]][[User talk:Vanisaac\|Isaac]], MPLL<sup> [[Special:Contributions/Vanisaac\|cont]]</sup><sub style="margin-left:-3.5ex"><small>[[WP:WPWR\|WpWS]]</small></sub> 04:47, 17 September 2021 (UTC) The article title should match the bolded term in the lede, so if you accept that the lede should start with "The Unicode Standard" in bold then you really have to accept that the article title should also be "[The] Unicode Standard". [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:33, 17 September 2021 (UTC) More the other way around: in general, the article title should reappear in bold in the [[WP:FIRSTSENTENCE\|first sentence]]; an alternative name can be added in bold (as is the case today [https://en.wikipedia.org/w/index.php?title=Unicode&diff=1044856772&oldid=1044753689&diffmode=source]). -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:44, 18 September 2021 (UTC) '''Oppose'''. We use common names, not official ones. [[User:Calidum\|<span style="color:#01796F; font-family:serif">'''-- ''Calidum'''''</span>]] 15:40, 17 September 2021 (UTC) '''Support''' due to [[WP:PRECISE]], not due to official names. That rationale is badly flawed, yes, but the precision one is very relevant. [[User:Red Slash\|<span style="color:#FF4131;">Red</span>]] [[User talk:Red Slash\|<b><span style="color:#460121;">Slash</span></b>]] 19:08, 17 September 2021 (UTC) '''Oppose'''. First of all, possibble ambiguity is limited to "[the] Unicode Standard" and "Unicode Consortium"; other terms mentioned in the proposal (''Unicode characters, Unicode-encoded text, or any number of things ...'') do not appear as ambiguous terms. What is ambigu in "Unicode characters"? — instead it is fully self-explaining! When specification between ...Standard or ...Consortium characters be needed, one should do so in the text. Also, a name like "Unicode CLDR" is not shortened to "Unicode" ever, nor is any Unicode Technical Report name [https://www.unicode.org/reports/index.html#annexes], so these do not apply. :Second, Unicode themselves uses plain "Unicode" for the Standard throughout and consistently: see [https://www.unicode.org/main.html main TOC], [https://www.unicode.org/glossary/#U Glossary]. Except for self-referring situations, this leaves no misunderstanding (when self-referring could be confusing, one writes like "The Unicode Standard is maintained by Unicode Consortium"). No problem here. :On wikipedia: As others have noted, [[WP:OFFICIALNAMES]] applies. Also, per [[WP:DISAMBIGUATION]]: we can easily establish that "Unicode Standard" is the ''primary topic'' for "Unicode". From there, we can create article [[Unicode (disambiguation)]] (with two entrances then) and add hatnote {{tl\|about}} to this article. Also, per [[WP:COMMONNAME]], current title is preferred and acceptable. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:08, 18 September 2021 (UTC) * '''Oppose''' I disagree with {{tq\|The subject of this article is specifically the Unicode Standard}} - it is about a broad concept of Unicode characters, Unicode-encoded text, Unicode input systems ... basically anything other than the organization called [[Unicode Consortium]]. I'm not opposed to a new [[History of the Unicode Standard]] article which focuses specifically on information about the development of versions of the Unicode Standard. [[User:力]] (power~enwiki, [[User talk:力\|<span style="color:#FA0;font-family:courier">π</span>]], [[Special:Contributions/力\|<span style="font-family:courier">ν</span>]]) 17:30, 21 September 2021 (UTC) '''Oppose''' per DePiep and 力 (power~enwiki). Also, if we are to use it, I believe the Wikipedia [[MOS:CAPS\|guidelines for capitalization]] would indicate that "standard" should be in lowercase (regardless of whether the consortium uses lowercase or not). Wikipedia avoids unnecessary use of uppercase. —⁠ ⁠[[User:BarrelProof\|BarrelProof]] ([[User talk:BarrelProof\|talk]]) 00:21, 24 September 2021 (UTC) <div style="padding-left: 1.6em; font-style: italic; border-top: 1px solid #a2a9b1; margin: 0.5em 0; padding-top: 0.5em">The discussion above is closed. <b style="color: #FF0000;">Please do not modify it.</b> Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.</div><!-- from [[Template:Archive bottom]] --> </div><div style="clear:both;"></div> == Vulnerabilities == A security advisory has been recently released from two researchers, one from the University of Cambridge and the other from the same and from the University of Edinburgh, in which they assert that carefully crafted computer source code can be used to introduce vulnerabilities in apparently harmless programs. Some security groups (like the one for Rust language) are already taking measures and issuing their own security advisories. I think that is something that affects Unicode as source code is one of the main applications of the standard. What do ye think would be a good way to introduce that to the article? <ref>https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/</ref> <ref>https://www.trojansource.codes/trojan-source.pdf</ref> <ref>https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html</ref> [[User:Bruno Unna\|Bruno Unna]] ([[User talk:Bruno Unna\|talk]]) 12:02, 1 November 2021 (UTC) ::Looks like existing {{slink\|Unicode\|Issues}} is the place to go. Indeed, a true Unicode case ({{unichar\|202E\|RIGHT-TO-LEFT OVERRIDE}}). Also consider mentioning at [[Bidirectional text]]? -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:29, 1 November 2021 (UTC) {{reflist-talk}} سلام [[User:M.h.gholamii\|M.h.gholamii]] ([[User talk:M.h.gholamii\|talk]]) 19:24, 14 July 2022 (UTC) ko [[User:M.h.gholamii\|M.h.gholamii]] ([[User talk:M.h.gholamii\|talk]]) 19:24, 14 July 2022 (UTC) == question == I was designing text shapes for electrical symbols and electronic elements. I design them on the Unicode-encoded FontCreator program, but after exporting it and copying and pasting the symbol I designed into the phone programs, it does not work and appears in the form of a question mark, what is the solution? (Note this topic is important for articles development, I want to design different symbols for non-electrical shapes and not only in the field of electricity and I don't want them to be thumbnails but text). [[User:Mohmad Abdul sahib\|<span style="font-size:0.875em;color: #339900;text-shadow:silver 0.2em 0.2em 0.1em;">'''Mohmad Abdul sahib'''</span>]] <span style="font-size:0.875em;color: Purple;text-shadow:silver 0.1em 0.2em 0.2em;">'''[[User:Mohmad Abdul sahib\|talk☎]]'''</span> [[User talk:Mohmad Abdul sahib\|talk]] 18:15, 18 April 2022 (UTC) :Likely, FontCreator has the appropriate ''font'', containing the electric symbols. But the receiving programm does not. Looks like the font should have ([[Unicode block]]) [[Miscellaneous Technical]]. Requires downloadingf a certain font, but I cannot help any further. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 19:52, 14 July 2022 (UTC) == New Taskforce WikiProject Unicode? == A proposal is opened at [[Wikipedia_talk:WikiProject_Computing#Taskforce_WikiProject_Unicode_–_proposal\|WP:COMP § Taskforce WP Unicode –_proposal]]. Please take a look. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 09:35, 2 October 2022 (UTC) == Version 15 & Wikidata == I am adding new blocks & data to Wikidata now. Assuming no DAB needed here, the pages are: {{bulleted list \|[[Arabic Extended-C]] — [[Template:Unicode chart Arabic Extended-C]] → WD:{{Q\|Q113956924}} \|[[Devanagari Extended-A]] — [[Template:Unicode chart Devanagari Extended-A]] → WD:{{Q\|Q113956904}} \|[[Kawi (Unicode block)]] — [[Template:Unicode chart Kawi]] → WD:{{Q\|Q113956944}} \|[[Kaktovik Numerals (Unicode block)]] — [[Template:Unicode chart Kaktovik Numerals]] → WD:{{Q\|Q113956957}} \|[[Cyrillic Extended-D]] — [[Template:Unicode chart Cyrillic Extended-D]] → WD:{{Q\|Q113956962}} \|[[Nag Mundari]] — [[Template:Unicode chart Nag Mundari]] → WD:{{Q\|Q113956955}} \|[[CJK Unified Ideographs Extension H]] — [[Template:Unicode chart CJK Unified Ideographs Extension H]] → WD:{{Q\|Q113956966}} }} [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 16:10, 13 September 2022 (UTC) :QID added -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 16:33, 13 September 2022 (UTC) :more listing -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 18:02, 13 September 2022 (UTC) ::Not much time to complete this list, for me. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 18:12, 13 September 2022 (UTC) Note that, as far as I can see, only two content articles require the "(Unicode block)" DAB-specifier, because of name overlap. The other "X (Unicode block)" pages sould be redirects to their (unambiguously named) content Block article. See also {{tl\|Unicode blocks/overview}}. DePiep. {{Recent changes in Unicode}} By now, most 15.0 changes seem to be processed & updated. See REcent Changes for current edits history. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 11:31, 21 September 2022 (UTC) :As a list of version-15.0-changes needed or done, this list is incomplete. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 05:36, 24 October 2022 (UTC) == Code Points == The lead claims that there are currently 149 186 characters in the Standard. That's confusing! Is that actual characters or does it include unprintable code points? I know what a code point is, my point is that the lead shouldn't confuse code points with characters. (I also argue that a "control character" isn't 'really' a character, not a grapheme, but that's a fight for somewhere else.) Writing about Unicode without an early clear explanation of what a code point is, is -I think- awful pedagogy. In fact, I don't think code point - a fundamental aspect of Unicode - is even defined in the article!!!! Wow, just wow. I also would like someone to verify that Unicode has characters for color. I believe that's wrong/false/misleading. I am aware that certain emoji can be modified by a code point to change some of its color. As far as I know, this is only true with a very small set of code points, and a very very small set of colors (I don't actually know if the colors are well-defined, I'd expect so, but...). These aren't colors, but are color modifiers for those other code points. [[Special:Contributions/174.130.71.156\|174.130.71.156]] ([[User talk:174.130.71.156\|talk]]) 16:00, 13 December 2022 (UTC) :There are no color defining codes in Unicode but there are names of characters that specify a color if displayed on a color device. Searching the word color in the article shows some possibly confusing text about color but nothing outright wrong. :This article leaves a lot to be desired, if you wish to make changes, you should. It's a wiki after all. [[User:SchmuckyTheCat\|SchmuckyTheCat]] ([[User talk:SchmuckyTheCat\|talk]]) 05:43, 15 December 2022 (UTC) There are two Variation Selectors (U+FE0E and U+FE0F) which specify whether an Emoji should be ideally displayed in color or black and white, but other than that, there are no color specifications in Unicode. The term "character" and "code point" are specified in the Unicode Standard, and if you feel that the coverage here is inadequate in conveying the meaning of those terms, I absolutely encourage you to contribute content to better reflect their technical specification. For the record, any code point defined beyond "Not A Character" or "Reserved" is a "character". This means control characters and whitespace are all considered characters in Unicode, just like a letter in an alphabet, a Kanji with On and Kun readings, or a mathematical symbol. [[User:Vanisaac\|Van]][[User talk:Vanisaac\|Isaac]], GHTV<sup> [[Special:Contributions/Vanisaac\|cont]]</sup><sub style="margin-left:-3.5ex"><small>[[WP:WPWR\|WpWS]]</small></sub> 06:18, 15 December 2022 (UTC) == Lead is simply wrong. == The offending sentence is:"The Unicode standard defines three and several other encodings exist, all in practice [[Variable-width encoding\|variable-length encodings]]." (Sure, you could strain to interpret that to mean "all but UTF-32", but let's keep it clear. It clearly implies all encodings are variable length. Wikipedia's own article on UTF-32 says it is fixed length. (Because it only needs to use 21 of the 32 bits for Unicode code points, it is very inefficient (and rarely used, afaik). But rarely used is not the same as "doesn't exist", and "all are variable" clearly implies it doesn't exist. I'd have to look again, are there really 3 variable Unicode encodings? I can only think of UTF-8 and UTF-16. (and some others that afaik are not "defined" in the Unicode standard (like GB18030), or that are obsolete (like UTF-7).) Replace "all" with "all common encodings" or something similar, and mention UTF-32.[[Special:Contributions/174.130.71.156\|174.130.71.156]] ([[User talk:174.130.71.156\|talk]]) 11:43, 15 December 2022 (UTC) :I think the intended meaning of this was that even if ''code points'' are fixed-size, modern Unicode is effectively variable-width, as what the user thinks is a "character" sometimes needs multiple code points.[[User:Spitzak\|Spitzak]] ([[User talk:Spitzak\|talk]]) 16:40, 15 December 2022 (UTC) ::Yes, Unicode includes both [[combining character]]s and [[precomposed character]]s, e.g., <{{U+\|0061}} “a” latin small letter a> <{{U+\|0308}} "¨" combining diaeresis> is equivalent to <{{U+\|00E4}} "ä" latin small letter A with diaeresis>. Further, some glyphs exist at multiple code points for historical reasons. There is a discussion of cannonical forms in the Unicode standard. --[[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 21:57, 15 December 2022 (UTC) ::It seems odd to me to describe code points as "fixed size". They're just an abstract number. It's when you ''encode'' (or store) the code points that you get variable lengths, at least for UTF-8, UTF-EBCDIC, and UTF-16 as described in the article. I think combining characters are a red herring for this discussion. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 23:10, 15 December 2022 (UTC) :::The Unicode standard does restrict the number of code points, so describing them as as fixed length 21-bit or 32-bit data is reasonable. [[user:Spitzak\|Spitzak]] is referring to characters, which indeed are variable length, a separate issue from the length of an encoded code point that does deserve mention. --[[User:Chatul\|Shmuel (Seymour J.) Metz Username:Chatul]] ([[User talk:Chatul\|talk]]) 17:14, 16 December 2022 (UTC) == Inline mentioning == I object to the [https://en.wikipedia.org/w/index.php?title=Unicode&diff=prev&oldid=1151049361 reversal] by {{U\|Peter M. Brown}}, citing [[WP:ITALICTITLE]] inappropriately. I'd say that the name, a noun, should not be in italics. ITALICTITLE referst to the name of a ''work'', ie the work itself (play, periodic, book). However, the Unicode standard is a ''standard'', not a book &tc. not even it's publication. The Standard is abstraction: the set of rules. It is a proper noun full stop. Key is, the article title notes the subject: the standard not the book. [[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 17:04, 21 April 2023 (UTC) :{{ping\|Peter M. Brown}} -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 10:43, 23 April 2023 (UTC) == Why no section about missing graphemes? == I don't know if it would be manageable, but Unicode clearly does not have all commonly used symbols. A simple example is the very commonly used 'slash marks' used to count. Most reading this will be familiar with the sequence /, //, ///, ////, and <s>////</s> with the crossmark (strike-through) diagonal (top left to bottom right) rather than horizontal. (This is typical in the USA, I understand European convention is slightly different). I request the editors to consider the addition of a list of missing (but documented) symbols.[[Special:Contributions/40.142.183.146\|40.142.183.146]] ([[User talk:40.142.183.146\|talk]]) 11:49, 9 June 2023 (UTC) :Unicode's non-inclusion of tally marks is covered in {{slink\|Tally marks\|Unicode}}. I don't think it's a good idea to include it also in this article. That would open the door of listing every proposal that has not yet been accepted. [[User:Indefatigable\|Indefatigable]] ([[User talk:Indefatigable\|talk]]) 15:42, 9 June 2023 (UTC) :I also oppose this idea. The set of unencoded symbols is open-ended and may exceed the number of encoded symbols. There would also be no way to determine ''which'' unencoded symbols merit mention. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 16:01, 9 June 2023 (UTC) == Proposed new writing systems to be encoded into Unicode 16 == Unicode 16 is set to release in September 2024. I think the following (con)scripts definitely need to be encoded: Chữ Việt Trí - an alphabet invented by Tôn Thất Chương in 2012 for Vietnamese language. It's still nicer than Latin-based Quoc Ngu and needs wide recognition as the Shavian and Hangul did. * Add support for Quikscript. * Add extra missing runes from Baconsthrope and Sedgeford and Armanen runes * Possibly add something more. [[Special:Contributions/94.180.80.9\|94.180.80.9]] ([[User talk:94.180.80.9\|talk]]) 07:31, 9 July 2023 (UTC) :Take a look at Unicode's FAQ for [http://www.unicode.org/faq/char_proposal.html Submitting Successful Character and Script Proposals]. Wikipedia isn't affiliated with The Unicode Consortium so requests here won't be seen or acted upon by the people who can actually add characters/scripts to the Unicode Standard. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 14:39, 9 July 2023 (UTC)