Module talk:Unicode chart: Difference between revisions

Content deleted Content added
 
(15 intermediate revisions by 8 users not shown)
Line 85:
:* <b>The three characters that Unicode displays as "XXX" do indeed have abbreviations in NameAliases.txt but they all have a type of "figment" as in "figment of one's imagination". I feel strongly that we shouldn't assign abbreviations to the charts that contradict the ones used in the actual, cited Unicode charts.</b> [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
* I gave the control characters a light blue background and an explanatory footnote similar to those for RESERVED and NONCHARACTER. Also dashed boxes around the abbreviations, which are loaded from [[Module:Unicode data/aliases|here]]. Some have multiple abbreviations. The current behavior is to choose the last one, because at brief glance that seemed most correct in most cases. I'd rather we move the "official" or preferred abbreviation to the top and consistently select the first one instead. I've yet to research what, if anything, might be broken by changing abbreviation order.
:* <b><ttsamp>Module:Unicode data/aliases</ttsamp> is generated from Unicode's NameAliases.txt file. It looks like it is in the same order, so any tweeking we do to order would be problematic when the file is updated. If we changed the script that creates aliases we would just be moving the logic from the chart script to the generation script. Other users of <ttsamp>alias</ttsamp> may not have the same requirement so I think the right place to make the determination for what to use in the charts belongs in the chart script. I have another abbreviation issue but I'll do that in a new section for clarity.</b> [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
―[[special:contributions/cobaltcigs|cobaltcigs]] 09:17, 12 September 2019 (UTC)
 
Line 104:
 
==Formatting abbreviations==
Besides worrying about which abbreviations are used in the charts, there's an issue of formatting. Today, long ones are often split into two or more lines to control the width of the chart. An extreme example is NULL NOTE HEAD in [[Template:Unicode chart Musical Symbols]] but this practice happens in other places like [[Template:Unicode_chart_Mongolian]] and [[Template:Unicode chart Variation Selectors Supplement]]. I haven't checked to see if the abbreviations are always in a dashed box but maybe we could have a parm like <ttsamp><nowiki>...|abbr|1D15|{{resize|75%|NULL<br />NOTE<br />&amp;nbsp;HEAD&amp;nbsp;}}</nowiki></ttsamp> to preserve the ability to format these in the current fashion. In any case, formatting is something to consider. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
:Eww. See [[User:BabelStone/sandbox#Musical Symbols]] for an attempt to replicate that (without any <code><nowiki><br />&amp;nbsp;</nowiki></code> crap, which is great!). Note that 1D173–1D17A are identified as "format" characters in [[Module:Unicode data|this file]], but "NULL NOTE HEAD" is not. Hence the difference in css/color. The pink can of course be changed later. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:45, 13 September 2019 (UTC)
::Wow, I've never realised that U+1D159 is not a format character. Are there any other characters displayed as a dashed box around text that are not format or control characters? <s>I don't think so</s> (variation selectors are gc=Mn). The worrying thing is there seems to be no way of extracting the information from the UCD, so it relies on visually checking the Unicode code charts, but what if it changes suddenly to a graphic character in a new version of Unicode? My gut feeling is that gc=So is wrong if the character has no visible glyph and is not whitespace. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 22:52, 13 September 2019 (UTC)
Line 111:
 
==Version==
There have been many past discussions about how to determine which Unicode version to show in the footnote of the chart. Because they were manually updated, it wasn't practical to have a master switch for the version. If the charts are created using <ttsamp>Module:Unicode data</ttsamp> it might be possible to do away with the mindless updating I do once a year for all the charts. A new <ttsamp>Module:Unicode data/version</ttsamp> item could be added that is manually updated after all of the other <ttsamp>Module:Unicode data</ttsamp> files are updated. Basically, it's just a string field to say "We've updated all the other data to version x". If the version footnote was pulled from that string, it would alleviate a lot of manual effort. It would mean adding <ttsamp>Module:Unicode data/version</ttsamp> to the list of "regenerate the charts if tables x, y, and z change". [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
:FYI: After a few updates, all of the [[Module:Unicode data]] subpages are now up-to-date (Unicode version 12.1). [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 04:45, 14 September 2019 (UTC)
::I do like the idea of centralizing the version string. Even as a single-purpose one-liner module {{code|lang=lua|return "12.1"}} would be fine. ―[[special:contributions/cobaltcigs|cobaltcigs]] 10:17, 14 September 2019 (UTC)
Line 173:
:: ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:50, 16 September 2019 (UTC)
::: You can rely on <code>lookup_category</code> never returning <code>nil</code> (at least when supplied a valid code point); <code>memo_lookup</code> guarantees that. The return value is either a "real" category when the code point is found in <code>singles</code> or <code>ranges</code> or Cn (Unassigned). — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 22:40, 16 September 2019 (UTC)
::: Oops. Actually, what I said is true of [[Module:Unicode data/sandbox]], but at the moment [[Module:Unicode data]] is buggy. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 23:35, 30 September 2019 (UTC)
 
===Selectability: CSS vs. plain text===
Line 199 ⟶ 200:
::: Related: Can I also get your opinion on whether to put atypical abbreviations in the boxes for [[#General Punctuation, row U+206x]] above? ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:15, 17 September 2019 (UTC)
:::: Yes. I'd display all of the aliases in the order they appear in NameAliases.txt (which is preserved in [[Module:Unicode data/aliases]]). But I also think the ''type'' of alias is useful to know. My preference would look like this:
:::: <table class="wikitable" style="width:100%;"><tr><th style="padding: 0px; width: 8%; font-family: 'serif'; font-weight: normal; font-size: 250%;"><span style="border: 2px dashed black; padding: 3px;">LF</span></th><td><div class="title" style="font-weight: bold; display: inline-block;">U+000A &lt;control&gt;</div><div class="category" style="display: inline; white-space: pre; "> (control)</div><div class="aliases plainlist" style="line-height: 120%;"><div style="white-space: pre; display: inline-block; vertical-align: top;">Control: LINE FEED<br />Control: NEW LINE<br />Control: END OF LINE<br />Abbreviation: LF<br />Abbreviation: NL<br />Abbreviation: EOL</div><div style="font-family: monospace;">[other stuff below...]</div></div></td></tr></table>
:::::{{done}}, see [[#info-000A]] above. Using <code><nowiki><ul></nowiki></code> because [https://stackoverflow.com/a/1726103 <code><nowiki><br /></nowiki></code> is for poetry and mailing addresses]. And I've just noticed the word "alias" won't actually appear to the reader. ―[[special:contributions/cobaltcigs|cobaltcigs]] 17:09, 18 September 2019 (UTC)
:::: As far as which abbreviation to use in the Wikipedia chart, I think it should match the official, cited Unicode chart. I'm guessing that a lot of them match the first/only abbreviation type of named alias but obviously not always. As you mentioned, U+206x is a good example of chart abbreviations that don't match named aliases. I'm thinking a table of chart abbreviations would be required. You could probably default the chart abbreviation if no exception is found but would it be worth the processing to not find a match first or is it faster to just add them all to a table?<br />My concern with using different chart abbreviations than Unicode is that there is no right answer. If someone were to change the Wikipedia chart abbreviation for U+000A from <ttsamp>LF</ttsamp> to <ttsamp>NL</ttsamp> would that be wrong/revertable? What about <ttsamp>LINE</ttsamp>? Or <ttsamp>LFEED</ttsamp>? If we don't have a definitive way to determine the chart abbreviation we open ourselves up to edit wars. Being able to cite the actual Unicode chart gives us one, definitive chart abbreviation.<br />Great work so far, BTW. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 22:10, 17 September 2019 (UTC)
 
::::: Okay, clearly I misinterpreted "I think all alias types should be shown using the Type: ALIAS" to mean "replace more specific alias-type labels with the word ALIAS". Makes a lot more sense with a picture drawn, glad I asked.
Line 217 ⟶ 218:
See [[Module:Unicode chart/display]] and make any corrections/amendments as needed. Maybe I missed a few reading all those PDFs. Except for the CJK blocks where even "skimming" would be too generous a term. <s><code>display_NNNN</code></s> params will be whacked soon. ―[[special:contributions/cobaltcigs|cobaltcigs]] 04:38, 19 September 2019 (UTC)
:{{removed}} ―[[special:contributions/cobaltcigs|cobaltcigs]] 06:42, 19 September 2019 (UTC)
::I've reviewed the list and made some changes. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
==Going horizontal==
I've made the utf/html info slide to the right rather than downward when an alias list is present. Seems like a more efficient use of space. Seems to look okay next to the infamous BRAKCET correction, which I've confirmed is the longest string in the alias file. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:05, 18 September 2019 (UTC)
:I don't like the other information forced to the right when there's an alias. It's unexpected and I don't think the savings in vertical space makes up for it. Sorry, it just looks misaligned to me.
:Unrelated to the down vs. side option, I have two comments on the displayed information when you click on a code point:
:First, can we move the hex HTML escape sequence before the decimal one (&#x... / &#...)? I've never understood why someone would go through the trouble of calculating the decimal value of a code point in order to create an HTML escape sequence but maybe that's just me. In any case, having the hex value first would align nicer with the UTF-16 information directly above it. Hopefully the hex usage is more comman anyway so it would make sense putting it first.
:Second, instead of the wording "Introduced in Unicode version x", I'd like to use more precise wording that the source uses.[https://www.unicode.org/Public/UNIDATA/DerivedAge.txt] This wording change seems trivial but it gets around the messy issue of various pre-1.1 characters. If Age is 1.1 (the earliest shown in the file), it would say "Assigned as of Unicode 1.1". Otherwise it would say "Newly assigned in Unicode x". Thanks. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
==Named subsets added==
Line 241 ⟶ 247:
}}
―[[special:contributions/cobaltcigs|cobaltcigs]] 13:45, 20 September 2019 (UTC)
 
:I'd lean towards a jagged line like a ripped piece of paper but the thick black line is certainly noticable enough for the user to realize something's going on. I would, however, like the notes to say "heavy" or "thick" black line because every row has a "black horizontal line". [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
== Orientation of glyphs for vertical scripts ==
 
For scripts such as [[Template:Unicode chart Mongolian|Mongolian]] and [[Template:Unicode chart Phags-pa|Phags-pa]] which are written in vertical columns, the glyphs in the font have horizontal orientation so that complete runs of horizontal text can be rotated into vertical orientation by a higher level protocol (commonly [[CSS]]). Currently, in our code charts we rotate the glyphs into vertical orientation. This used to match the Unicode code charts, which used to show vertically-oriented glyphs for Mongolian and Phags-pa, but a few years back the editor of the Unicode code charts deliberately changed the Mongolian and Phags-pa code charts to show horizontally-oriented glyphs to reflect how the glyphs are represented at the font level. My question is, should we continue to rotate glyphs in the dynamic Mongolian and Phags-pa charts or should we leave them in horizontal orientation to match the current Unicode code charts? My preference is to rotate into vertical orientation as this matches user expectation (it is how Mongolian and Phags-pa glyphs are presented in books on these scripts). [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 08:12, 28 September 2019 (UTC)
:I don't have a strong preference, although I do think Unicode showing them horizontally seems strange. Vertical seems better. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
== Unicode 13.0 ==
 
Unicode 13.0 will be released in March. Can we complete outstanding work on the Unicode chart module by then? Or shall we continue to use the old Unicode chart templates for the Unicode 13 update? [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:16, 10 January 2020 (UTC)
 
== The cell displayed for U+E003B (TAG SEMICOLON) contains a colon in a dashed box instead of a semicolon ==
 
The chart shown on page [[Tags_(Unicode_block)]] shows the various tag characters as their normal version in a dashed box, but the character shown in the box for U+E003B (TAG SEMICOLON) is a colon instead of a semicolon. I'm not quite sure where/how to update the template. [[Special:Contributions/81.107.76.114|81.107.76.114]] ([[User talk:81.107.76.114|talk]]) 00:29, 12 August 2020 (UTC)
 
== Missing end tag for table ==
 
{{Ping|Cobaltcigs|Erutuon}} {{Tl|Unicode chart}} has little usage guidance, and I came to [[Module talk:Unicode chart]] (this very page), which has 6 missing end tags for {{tag|table}}, all associated with {{Tl|Unicode chart}}. So I went to [[Special:WhatLinksHere/Template:Unicode chart|Pages that link to "Template:Unicode chart"]]. There are 6 pages that transclude {{Tl|Unicode chart}}, and they all have missing end tags for {{tag|table}}.
 
So, my request is either abandon this project, or write some usage notes that include how to use it without leaving a missing end tags lint error for {{tag|table}}. —[[User:Anomalocaris|Anomalocaris]] ([[User talk:Anomalocaris|talk]]) 07:32, 8 October 2023 (UTC)
: [[User:Vanisaac|Vanisaac]] mistakenly [[Special:Diff/1168432448|got rid of]] the end of the table (<code>|}</code>) while inserting this module into [[Template:Unicode chart]]. [[User:SWinxy|SWinxy]] [[Special:Diff/1169564617|added it back]], but inside the noinclude tag. I just moved it so that it was transcluded. I'm not sure the module should be in the template at this point because it's still marked as "pre-alpha" and hasn't been worked on since 2019, but I'm not going to try to evaluate that. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 20:48, 8 October 2023 (UTC)
::Ah thank you. I must've thought that Module:Unicode chart somehow emitted a |} upon transclusion of this template, but not when the module was invoked, hence why I put the |} in the noinclude. [[User:SWinxy|SWinxy]] ([[User talk:SWinxy|talk]]) 21:38, 8 October 2023 (UTC)
::[[User:Erutuon|Erutuon]]: Thank you for taking care of this! —[[User:Anomalocaris|Anomalocaris]] ([[User talk:Anomalocaris|talk]]) 22:59, 8 October 2023 (UTC)
 
== Trying again from scratch ==
 
When I stumbled across this (April 2024) [[Template:Unicode chart]] wasn't working and no one seemed to be actively working on it. I sent a message to [[User:Cobaltcigs]] (the last person who edited [[Module:Unicode chart]] and when I didn't hear back, I went ahead and started trying to build by own version in the sandbox. The pages I'm using are:
* '''Lua:''' [[Module:Unicode chart/sandbox]]
* '''CSS:''' [[Template:Unicode chart/sandbox/styles.css]]
* '''Template:''' [[Template:Unicode chart/sandbox]]
 
After a couple days, I've created something that works in the majority of testcases, although there are still some edgecases for unusual characters that still need to be ironed out. You can see my version at:
* '''Testcases:''' [[Template:Unicode chart/sandbox/testcases]]
 
- [[User:Eievie|Eievie]] ([[User talk:Eievie|talk]]) 18:22, 22 April 2024 (UTC)