Module talk:Unicode chart: Difference between revisions

Content deleted Content added
Version: list of updated data
 
(64 intermediate revisions by 9 users not shown)
Line 1:
{{talkheader}}
{{to do|inner=
*CentralizeAdd current<code>is_default_ignorable</code> versionto info[[Module:Unicode data]] (currently in [[Module:Unicode data/versionsandbox]]?) and deleteuse it to identify "default ignorable" code parameterpoints.
*Figure out how to programmatically identify "default ignorable" code points and printable vs. non-printable format chars (rather than hardcoding a bunch of ranges—which is feasible but might not age well).
*Add a way to insert column header "reminder" rows at arbitrary intervals for huge blocks. Or maybe just do it automatically every 16 rows.
*Css
**Figure out [[#Formatting abbreviations|ideal scaling factors]] (and spacing) for characters and boxed placeholder abbreviations.
**Cell height<s>/width</s> properties: Do we need them? Something about [[Template:Unicode chart Javanese|Javanese]] in particular?
***'''width''' has been done anyway.
**Choose appropriate <code>font-family</code> order of succession for empty class definitions at [[Template:Unicode chart/script styles.css]].
*Upgrade to actually use Javascript.
**Figure out what to do about the [https://i.imgur.com/QzF7oVa.png highly elongated] [[%EF%B7%BD|U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM]].
*Implement some way to [[WP:CLICKHERE|click to show]] more character info [[#Notes about notes|in the footer area]] without rustling everyone's jimmies.
}}
 
Line 38:
# How to handle block-specific formatting? For example [[Template:Unicode chart Javanese]] has a specific height and some of the characters in [[Template:Unicode chart Control Pictures]] use a different font size.
# How to handle character links? Like {{ping|BabelStone}}, I'm not a fan of linking specific characters (but others are). It looks like your code, optionally, will link every character if an article exists, but this could increase the number of linked characters. And many characters aren't linked to the character itself, like U+2245 in [[Template:Unicode chart Mathematical Operators]]. Some link to wikt, like U+0x2105 in [[Template:Unicode chart Letterlike Symbols]] and all the characters in [[Template:Unicode chart CJK Unified Ideographs Extension A]].
# {{done}} Some blocks have special parameters that need to be taken into account: [[Template:Unicode chart Alphabetic Presentation Forms]], [[Template:Unicode chart Enclosed Alphanumeric Supplement]], [[Template:Unicode chart Enclosed CJK Letters and Months]], [[Template:Unicode chart Halfwidth and Fullwidth Forms]], [[Template:Unicode chart Miscellaneous Symbols]], and [[Template:Unicode chart Supplemental Symbols and Pictographs]]. As with most of these questions, this only only applies if you're replacing existing chart templates.
# How to determine the chart name? Most charts use the block name for the title but some don't. For example, "C0 Controls and Basic Latin" is the chart name for the "Basic Latin" block.
# How to determine what to link the chart name to. For example, the [[Template:Unicode chart Kangxi Radicals]] chart links to "Kangxi radical#Unicode". Most either link to the block name itself or the block name with "(Unicode block)" appended.
Line 62:
**<code>start</code>/<code>end</code> parameters have been scrapped in favor of a single <code>range</code> parameter which can contain multiple ranges (connected by hyphen or en dash, and separated from each other by comma, whitespace, the word "and", or in fact anything that's not a hex digit).
*14 and 15. If the unicode block display names can't be made to exactly match the [[Module:Unicode data/blocks|"official" names]] in all cases, we'll need a (hopefully short) list of aliases. Adding a blocknamelink parameter which continues to default to <code>Blockname (Unicode chart)</code> if empty would be easy and sufficient. Let's try to avoid having three sets of names wherever possible.
**{{done}} <code>link_name</code> and <s><code>display_name</code></s> parameters added for differing cases. ―[[special:contributions/cobaltcigs|cobaltcigs]] 13:13, 14 September 2019 (UTC)
*{{done}} 16. I don't see why not. See 13.
―[[special:contributions/cobaltcigs|cobaltcigs]] 18:20, 10 September 2019 (UTC)
Line 72:
*8 The "Dashed Box Convention" is explained at https://www.unicode.org/versions/Unicode12.0.0/ch24.pdf#G8175 It's an oversight not having a note explaing this convention. It was added to match Unicode's charts. I think it's useful. Depending on the font, without the dashed box U+0602 is easily confusable with U+060E, U+1F1E6 looks the same as captial A, etc. As far as I know there's no way to determine which characters get a dashed box programmatically. As of version 12.1 it's used on U+0000-0020, 007F-00A0, 00AD, 034F, 0600-0605, 061C, 06DD, 070F, 08E2, 0CF1-0CF2, 0D4E, 0F0C, 1039, 115F-1160, 17B4-17B5, 17D2, 180B-180E, 1A60, 1BAB, 1CF5-1CF6, 2000-200F, 2011, 2028-202F, 205F-2064, 2066-206F, 2D7F, 2E3A-2E3B, 3000, 303E, 3164, AAF6, FE00-FE0F, FEFF, FFA0, FFF9-FFFB, 10A3F, 11003-11004, 1107F, 110BD, 110CD, 111C2-111C3, 11A3A, 11A47, 11A84-11A89, 11A99, 11D45-11D46, 11D97, 13430-13438, 16F8F-16F92, 1BC9D, 1BCA0-1BCA3, 1D159, 1D173-1D17A, 1DA9B-1DA9F, 1DAA1-1DAAF, 1F1E6-1F1FF, E0001, E0020-E007F, and E0100-E01EF.
*10 Unicode charts use XXX (in a dotted box) for U+0080, 0081, and 0099 and I don't think Wikipedia's charts should contradict the cited source. (For some archane history of these three characters, I recommend http://unicode.org/pipermail/unicode/2015-October/002876.html) I think the only way of determining the abbreviations to use in the charts is a hardcoded table. They don't always match an alias. For example U+E007F is displayed as "END". A lot of the code points that use the dashed box convention display abbreviations. I haven't compiled a definitive list.
*{{done}} 13 In [[Template:Unicode chart Enclosed CJK Letters and Months]] the hangul subset isn't contiguous. Nor is the emoticon subset of [[Template:Unicode chart Miscellaneous Symbols]]. I didn't add these features so I don't know what reaction you'll get from removing them.
[[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 23:09, 10 September 2019 (UTC)
 
Line 85:
:* <b>The three characters that Unicode displays as "XXX" do indeed have abbreviations in NameAliases.txt but they all have a type of "figment" as in "figment of one's imagination". I feel strongly that we shouldn't assign abbreviations to the charts that contradict the ones used in the actual, cited Unicode charts.</b> [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
* I gave the control characters a light blue background and an explanatory footnote similar to those for RESERVED and NONCHARACTER. Also dashed boxes around the abbreviations, which are loaded from [[Module:Unicode data/aliases|here]]. Some have multiple abbreviations. The current behavior is to choose the last one, because at brief glance that seemed most correct in most cases. I'd rather we move the "official" or preferred abbreviation to the top and consistently select the first one instead. I've yet to research what, if anything, might be broken by changing abbreviation order.
:* <b><ttsamp>Module:Unicode data/aliases</ttsamp> is generated from Unicode's NameAliases.txt file. It looks like it is in the same order, so any tweeking we do to order would be problematic when the file is updated. If we changed the script that creates aliases we would just be moving the logic from the chart script to the generation script. Other users of <ttsamp>alias</ttsamp> may not have the same requirement so I think the right place to make the determination for what to use in the charts belongs in the chart script. I have another abbreviation issue but I'll do that in a new section for clarity.</b> [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
―[[special:contributions/cobaltcigs|cobaltcigs]] 09:17, 12 September 2019 (UTC)
 
Line 100:
I'm prepared to go with #4 for now, then upgrade to #5–6 only after all the other issues are addressed. ―[[special:contributions/cobaltcigs|cobaltcigs]] 09:17, 12 September 2019 (UTC)
:I've never been very keen on specifying fonts on the Wikipedia side, because 1) most fonts for most Unicode scripts are not available on most users devices without downloading them; 2) in the past editors have tended to specify fonts that they have on their own system so that it looks nice for them, without considering other users; and 3) the Wikipedia specified fonts may override users' font preferences set in their browser (or in Wikipedia settings). Personally I would rather not specify any fonts, and leave it to the user's browser to apply an appropriate font, but I know that this is a minority view, so I'm OK with your suggested solution. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 13:06, 12 September 2019 (UTC)
::My understanding was that [[Internet Explorer|certain browsers]] would show the little squares even if a suitable font was installed, unless specifically told to use that font. I have no idea whether this is (still?) accurate. I suppose could add a parameter like <code>fonts=off</code>. Then we could ask several Windows users whether all the charts look okay with no fonts specified. ―[[special:contributions/cobaltcigs|cobaltcigs]] 19:04, 16 September 2019 (UTC)
::{{done}} <code>fonts=off</code> parameter now exists as an option. ―[[special:contributions/cobaltcigs|cobaltcigs]] 21:38, 17 September 2019 (UTC)
 
==Formatting abbreviations==
Besides worrying about which abbreviations are used in the charts, there's an issue of formatting. Today, long ones are often split into two or more lines to control the width of the chart. An extreme example is NULL NOTE HEAD in [[Template:Unicode chart Musical Symbols]] but this practice happens in other places like [[Template:Unicode_chart_Mongolian]] and [[Template:Unicode chart Variation Selectors Supplement]]. I haven't checked to see if the abbreviations are always in a dashed box but maybe we could have a parm like <ttsamp><nowiki>...|abbr|1D15|{{resize|75%|NULL<br />NOTE<br />&amp;nbsp;HEAD&amp;nbsp;}}</nowiki></ttsamp> to preserve the ability to format these in the current fashion. In any case, formatting is something to consider. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
:Eww. See [[User:BabelStone/sandbox#Musical Symbols]] for an attempt to replicate that (without any <code><nowiki><br />&amp;nbsp;</nowiki></code> crap, which is great!). Note that 1D173–1D17A are identified as "format" characters in [[Module:Unicode data|this file]], but "NULL NOTE HEAD" is not. Hence the difference in css/color. The pink can of course be changed later. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:45, 13 September 2019 (UTC)
::Wow, I've never realised that U+1D159 is not a format character. Are there any other characters displayed as a dashed box around text that are not format or control characters? <s>I don't think so</s> (variation selectors are gc=Mn). The worrying thing is there seems to be no way of extracting the information from the UCD, so it relies on visually checking the Unicode code charts, but what if it changes suddenly to a graphic character in a new version of Unicode? My gut feeling is that gc=So is wrong if the character has no visible glyph and is not whitespace. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 22:52, 13 September 2019 (UTC)
::I couldn't immediately work out where you are specifying a smaller font size for "NULL NOTE HEAD" compared with "Begin Beam" etc. I think that all the dashed boxes need a smaller font size because (on my system at least) the dashed letters are much larger size than Basic Latin letters, and make the cells overwide. Can we simply add "font-size:75%" for td.box in [[Template:Unicode chart/styles.css]], or is there more to it? [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 23:30, 13 September 2019 (UTC)
:::This text uses {{code|lang=css|span.small-1 { font-size:80%; } span.small-2 { font-size:59%; } }} wherein the suffix digit is determined by the number of spaces converted to linebreaks in whatever text is shown (which may be read from the aliases file or from a <s><code>display_NNNN</code></s> override parameter). Then the property {{code|lang=css|white-space:pre;}} forces <code>\n</code> to show up as literal linebreaks so we don't have to resort to {{code|lang=html|<br />}}. Thus one-word abbreviations such as <code>ACK</code> use the same size as regular chars. All of this can be easily changed. For now, I've tightened the dashed box and cell margins/padding a little bit. ―[[special:contributions/cobaltcigs|cobaltcigs]] 10:08, 14 September 2019 (UTC)
 
==Version==
There have been many past discussions about how to determine which Unicode version to show in the footnote of the chart. Because they were manually updated, it wasn't practical to have a master switch for the version. If the charts are created using <ttsamp>Module:Unicode data</ttsamp> it might be possible to do away with the mindless updating I do once a year for all the charts. A new <ttsamp>Module:Unicode data/version</ttsamp> item could be added that is manually updated after all of the other <ttsamp>Module:Unicode data</ttsamp> files are updated. Basically, it's just a string field to say "We've updated all the other data to version x". If the version footnote was pulled from that string, it would alleviate a lot of manual effort. It would mean adding <ttsamp>Module:Unicode data/version</ttsamp> to the list of "regenerate the charts if tables x, y, and z change". [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 21:47, 12 September 2019 (UTC)
:FYI: After a few updates, all of the [[Module:Unicode data]] subpages are now up-to-date (Unicode version 12.1). [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 04:45, 14 September 2019 (UTC)
::I do like the idea of centralizing the version string. Even as a single-purpose one-liner module {{code|lang=lua|return "12.1"}} would be fine. ―[[special:contributions/cobaltcigs|cobaltcigs]] 10:17, 14 September 2019 (UTC)
:::{{done}} ―[[special:contributions/cobaltcigs|cobaltcigs]] 09:25, 17 September 2019 (UTC)
::P.S. Can I get a complete list of subpages that have actually changed so I can update my localhost wiki (on which I test most of this stuff before posting) accordingly? ―[[special:contributions/cobaltcigs|cobaltcigs]] 10:42, 14 September 2019 (UTC)
:::I updated [[Module:Unicode data/category]], [[Module:Unicode data/control]], [[Module:Unicode data/scripts]], [[Module:Unicode data/names/002]], and [[Module:Unicode data/names/003]]. Some changes were unrelated to the release of v12.1. For example, U+2BC9, 2BFF, and 2E4F were missing for some reason. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 16:43, 14 September 2019 (UTC)
:::: On Wiktionary I updated the [[wikt:Module:Unicode data/names/002|the U+2xxx names module]] and several others for 12.0 back in March, but I didn't bother with the Wikipedia modules because they weren't being used. But I'm glad to see that now they are at last. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 09:01, 16 September 2019 (UTC)
 
Perhaps it would be helpful, now, to put an edit notice on all [[Special:Prefixindex/Module:Unicode data|Unicode data subpages]] that says "Please remember to update [[Module:Unicode data/version]] if applicable" etc. ―[[special:contributions/cobaltcigs|cobaltcigs]] 09:25, 17 September 2019 (UTC)
 
== Pink cells ==
A footnote says “Pink cells indicate non-printable format characters.” That is untrue: they currently indicate [https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3ACf%3A%5D all format characters], some of which are printable. It would be more useful, I think, to highlight [https://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3ADI%3A%5D-%5B%3ACn%3A%5D default ignorable characters]. [[User talk:Gorobay|Gorobay]] ([[User talk:Gorobay|talk]]) 03:27, 14 September 2019 (UTC)
:Okay. I shall try to figure out how to distinguish between these using the available data modules. ―[[special:contributions/cobaltcigs|cobaltcigs]] 11:19, 14 September 2019 (UTC)
:: There wasn't a data module for the Default Ignorable property, so I added <code>is_default_ignorable</code> to [[Module:Unicode data/sandbox]] and created [[Module:Unicode data/derived core properties]]. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 10:06, 16 September 2019 (UTC)
 
: Alternatively I could just take out the word "non-printable" (my own misconception at the time I wrote it) and make the existing footnote statement correct.
: But assuming "default-ignorable" is the more important concept, is my understanding correct the characters identified by <code>is_default_ignorable</code>:
:: (a) are a [[proper superset]] of <code>format</code> characters, and
:: (b) are a [[disjoint set]] of <code>control</code> characters?
: This is important to the extent that we can't highlight the same cell in two colors. The <code>td.format</code> css would be retired if we go this route, but the <code>format</code> designation would still be conveyed in the info panel below—which is (relatively speaking) a much a newer feature than the pink highlight.
: Counting the default-ignorables and determining whether to show the footnote as plural/singular/none (just like the others) will be a two-minute coding task given the available functions. What kind of verbiage would we want in the footnote for the default-ignorables? Seems like we should try to briefly explain what that means. ―[[special:contributions/cobaltcigs|cobaltcigs]] 18:43, 17 September 2019 (UTC)
 
== General Punctuation, row U+206x ==
Line 125 ⟶ 139:
| display_name = General Punctuation (2060–206F only)
| range = 2060–206F
<!-- parameters gone, now loaded from master list -->
| display_2061 = <i>ƒ</i><small>()</small>
| display_2062 = ✕
| display_2063 = ,
| display_2064 = +
<!-- skip 2065–2069 -->
| display_206A = ISS
| display_206B = ASS
| display_206C = IAFS
| display_206D = AAFS
| display_206E = NADS
| display_206F = NODS
| version = 12.1 (!!!)
}}
: I can't find a file in the Unicode Character Database that lists the display forms for the dotted box characters. They aren't in [https://www.unicode.org/Public/UCD/latest/ucd/NamesList.txt NamesList.txt], which is parsed into the PDF that you linked to. So they would have to be gathered manually from the PDFs, unless they can be found somewhere else. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 04:13, 18 September 2019 (UTC)
::As far as I know, there isn't anything in the UCD. I've always determined dotted box notation manually. BTW: I think the <s>display_20xx</s> parms above are appropriate. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 04:40, 18 September 2019 (UTC)
:: To clarify, "manually" would mean by visual approximation. Copy/paste gives us private-use codepoints assigned to arbitrary glyphs which represent the whole abbreviation (in some font that probably doesn't exist outside the PDF). So much eww. ―[[special:contributions/cobaltcigs|cobaltcigs]] 13:39, 18 September 2019 (UTC)
::: If you're interested, the fonts with the dashed glyphs (SpecialsUC4/5/6.ttf) are bundled with the free [https://unicode.org/unibook/ Unibook] application that is used to generate the Unicode and ISO/IEC 10646 code charts. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 16:06, 18 September 2019 (UTC)
 
==Info panel demo==
{{unicode chart|Greek and Coptic|version=12.1|info=yes|state=collapsed}}
{{unicode chart
| name = Basic Latin
| display_name = Basic Latin AND Latin-1 Supplement
| range = 0000–00FF
| version=12.1
| info=yes
| state=expanded
}}
Click the links and hold your breath. ―[[special:contributions/cobaltcigs|cobaltcigs]] 03:13, 16 September 2019 (UTC)
 
:Thanks, I like the idea, especially showing a large version of the character. I think the U+ and character name do not need to be in a huge bold font (maybe just normal bold), and the "(assigned)" is redundant -- using a normal font and removing "assigned" should also reduce the annoying horizontal expansion and contraction of the box as you click on characters with different lengthed names. I suppose the UTF-8 is useful to some people, but I would remove the characterization of the UTF-8 hex values as I cannot see how they could be useful. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 09:05, 16 September 2019 (UTC)
 
::*"assigned" is the default phrase returned when a character in question is not "control", "format", <s>"surrogate", "private-use", "unassigned",</s> "space-separator", "line-separator", or "paragraph-separator". The struck-out categories will probably never be part of any chart, which leaves five that are potentially interesting.
::*{{done}} Making the chart stay continuously at <code>width: 100%;</code> would probably help. Setting th 8% and td 5.75% would add up to same, and might also be helpful.
::*I've got it loading named character entity references from a subpage in addition to calculating the numeric ones, which is probably the single crowdpleasingest information here. The UTF-8 is of interest to the extent that it's what our urlencoding uses (Δ is 0xCE 0x94 and [[%CE%94|%CE%94]]). UTF-16 less so, but I thought about it.
;;*{{removed}} The [[mojibake]] depiction of these bytes as separate chars was slightly helpful when debugging but not meant as a serious feature.
::―[[special:contributions/cobaltcigs|cobaltcigs]] 10:13, 16 September 2019 (UTC)
: Nice method – I was surprised that it could be done without JavaScript! Maybe instead of the values from [[Module:Unicode data/control]], which include only some of the General Categories, the table could show the long name of the actual General Category. (I've added the long names of the General Categories to [[Module:Unicode data/category]].) — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 10:37, 16 September 2019 (UTC)
 
::It relies upon the css [https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors/Using_the_:target_pseudo-class_in_selectors <code>:target</code>] selector to show/hide the panel for any given codepoint. I think this would be nearly adequate if not for the vertical anchor-jumping. I suppose moving the info panel to the top (below the pdf link and above the column headers) would make it slightly less annoying, but it would look weird. Another consequence of this is that whenever multiple charts are present on the same page, opening an info panel on one chart will close info panels all others. So using Javascript would probably be better. It would only require convincing the right people that feature is worthwhile and not too app-like.
::For now I've reduced the size of the bold-face character name from 125% to 110%, set the root <code>table</code> element to full page-width, and set the columns to fixed percentages that add up to 100%.
::I've also [[Special:Diff/916051317|removed the 'Amiri' font]] from the <code>.script-Arab</code> css class, because it makes the U+FDFD ligature wide enough to make these percentages meaningless. I don't know if other characters are similarly affected. I'll need to install the first three fonts to test whether they have the same problem (or, indeed, others).
::I've now made it pull the "long name" (which appears to always be more interesting than the word "assigned") from Erutuon's info. Hopefully it's never <code>nil</code> and hopefully the extra info won't be overwritten by updates.
:: ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:50, 16 September 2019 (UTC)
::: You can rely on <code>lookup_category</code> never returning <code>nil</code> (at least when supplied a valid code point); <code>memo_lookup</code> guarantees that. The return value is either a "real" category when the code point is found in <code>singles</code> or <code>ranges</code> or Cn (Unassigned). — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 22:40, 16 September 2019 (UTC)
::: Oops. Actually, what I said is true of [[Module:Unicode data/sandbox]], but at the moment [[Module:Unicode data]] is buggy. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 23:35, 30 September 2019 (UTC)
 
===Selectability: CSS vs. plain text===
Putting general category after character name is good; show/hide is good; 100% width of chart is very good. At present you cannot select and copy the entire info panel information: UTF-8 and HTML headings, as well as parentheses around general category, are not selected, and there is no space between character name and general category so the two are concatenated on copy. Can we make all parts of the info panel copyable, and separate parenthesised general category from character name by a space character rather than putting in different cells? [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:15, 17 September 2019 (UTC)
:What you see there is actually an intentional css effect (see instances of <code>:before { content: 'foo'; }</code> on the [[Template:Unicode chart/styles.css|styles.css]] page). This is similar to [[Template:Navbox|navboxes]] ([[Template:Steely Dan|example]]) where they use a spaced U+00B7 MIDDLE DOT (<code>' · '</code>) as a non-selectable separator for list items (<code><nowiki><li></nowiki></code>). Here I've used commas and spaces instead, and also used the same technique for <code>ul:before</code> list labels. It could all easily be reverted to plain text. I'll await further discussion about [[Separation of content and presentation|whether it should]], because it did take a bit of work to make it look right. And really, the whole idea here was actually to help users copy the <code>&amp;foo;</code> html character entity reference without accidentally including the adjacent comma. ―[[special:contributions/cobaltcigs|cobaltcigs]] 12:01, 17 September 2019 (UTC)
::Ah, that explains it. Personally, I prefer plain text so that the user can select everything. I think we can dispense with the comma between HTML forms (semi-colon followed by a comma just looks weird), and separate them with a space. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 16:11, 18 September 2019 (UTC)
 
===Actual aliases vs. corrections===
Can we have a demo of the info panel for a block with one or more characters that have a formal alias? I suggest Vertical Forms with its horrendously long name and alias for FE18. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:22, 17 September 2019 (UTC)
:Eww, a spelling error ("BRAKCET"). So the correctly spelled name is '''currently not loaded at all''' because it's recorded in the [[Module:Unicode data/aliases|aliases file]] as a <code>correction</code> rather than an <code>alias</code>. Aliases are currently loaded by the module (see control characters in the Latin chart above), whereas corrections will be a new concept which I'm not yet sure how best to handle. Do we want to show the misspelled title (maybe with a {{tl|sic}} tag, even) and note the correction as such on the next line? Or should we just replace it outright without comment? I suppose I'll begin reviewing the other <code>correction</code>s vs. what names they are correcting, to see how trivial or major their differences tend to be. For now, here's what the Vertical Forms block currently looks like: ―[[special:contributions/cobaltcigs|cobaltcigs]] 12:01, 17 September 2019 (UTC)
{{unicode chart|Vertical Forms|info=yes}}
 
{{collapsible section|title=Complete list of corrections (28) to consider|content=<nowiki />
{{User:Cobaltcigs/List of Unicode corrections}}
<nowiki />}}
 
:If we do want the corrections to appear below the boldface name, similarly to aliases (see [https://i.imgur.com/1O4YOPz.png screenshot from ''ye olde localhoste'']), I'm ready to update this module accordingly. Note: I did check [[Module:Unicode data/aliases|the list]] and confirm no codepoint has both a correction and an alias. Perhaps we also want some kind of footnote explaining aliases and corrections to the reader, but I'll hold off on that. ―[[special:contributions/cobaltcigs|cobaltcigs]] 13:55, 17 September 2019 (UTC)
:: I like how the correction is shown directly below the name in your screenshot; it makes it easy to compare the two. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 17:57, 17 September 2019 (UTC)
::: {{done}} And with that in mind I've also removed the font-size enlargement of the bold-face character name. ―[[special:contributions/cobaltcigs|cobaltcigs]] 18:17, 17 September 2019 (UTC)
:: The code point name is immutable so it should always be shown as-is, as you're doing (even when it's clearly wrong). As far as the data is concerned, "correction" is just another type of alias like "alternate", "abbreviation", and "figment". I think all alias types should be shown using the "Type: ALIAS" format without need for an explanation. It looks like that isn't being done for code points like U+0093, etc. Lastly, I wouldn't count on there never being a second alias to a code point with a correction type alias. There's no such restriction, even though that's the case right now. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 18:56, 17 September 2019 (UTC)
::: For the sake of clarity I'll use U+000A as a more extreme example. Are you saying it should look like this?
::: <small>(visually at least; never mind the style attributes approximating current css class effects; actual markup will be much shorter)</small>
::: <table class="wikitable" style="width:100%;"><tr><th style="padding: 0px; width: 8%; font-family: 'serif'; font-weight: normal; font-size: 250%;"><span style="border: 2px dashed black; padding: 3px;">LF</span></th><td><div class="title" style="font-weight: bold; display: inline-block;">U+000A &lt;control&gt;</div><div class="category" style="display: inline; white-space: pre; "> (control)</div><div class="aliases plainlist" style="line-height: 120%;"><div style="white-space: pre; display: inline-block; vertical-align: top;">Aliases: </div><ul style="display: inline-block;"><li>LINE FEED</li><li>NEW LINE</li><li>END OF LINE</li><li>LF</li><li>NL</li><li>EOL</li></ul></div><div style="font-family: monospace;">[other stuff below...]</div></td></tr></table>
::: i.e. putting aliases of all types (including abbreviations) in a single list, in the order given in the [[Module:Unicode data/aliases|aliases file]], with zero regard for what type of alias they are, and without choosing any of them to replace the name <code>&lt;control&gt;</code> at the top. I can do that once certain that's what you mean. Let's also revisit the question of how to decide which of multiple abbreviations should be shown in the box. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:09, 17 September 2019 (UTC)
 
::: Related: Can I also get your opinion on whether to put atypical abbreviations in the boxes for [[#General Punctuation, row U+206x]] above? ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:15, 17 September 2019 (UTC)
:::: Yes. I'd display all of the aliases in the order they appear in NameAliases.txt (which is preserved in [[Module:Unicode data/aliases]]). But I also think the ''type'' of alias is useful to know. My preference would look like this:
:::: <table class="wikitable" style="width:100%;"><tr><th style="padding: 0px; width: 8%; font-family: 'serif'; font-weight: normal; font-size: 250%;"><span style="border: 2px dashed black; padding: 3px;">LF</span></th><td><div class="title" style="font-weight: bold; display: inline-block;">U+000A &lt;control&gt;</div><div class="category" style="display: inline; white-space: pre; "> (control)</div><div class="aliases plainlist" style="line-height: 120%;"><div style="white-space: pre; display: inline-block; vertical-align: top;">Control: LINE FEED<br />Control: NEW LINE<br />Control: END OF LINE<br />Abbreviation: LF<br />Abbreviation: NL<br />Abbreviation: EOL</div><div style="font-family: monospace;">[other stuff below...]</div></div></td></tr></table>
:::::{{done}}, see [[#info-000A]] above. Using <code><nowiki><ul></nowiki></code> because [https://stackoverflow.com/a/1726103 <code><nowiki><br /></nowiki></code> is for poetry and mailing addresses]. And I've just noticed the word "alias" won't actually appear to the reader. ―[[special:contributions/cobaltcigs|cobaltcigs]] 17:09, 18 September 2019 (UTC)
:::: As far as which abbreviation to use in the Wikipedia chart, I think it should match the official, cited Unicode chart. I'm guessing that a lot of them match the first/only abbreviation type of named alias but obviously not always. As you mentioned, U+206x is a good example of chart abbreviations that don't match named aliases. I'm thinking a table of chart abbreviations would be required. You could probably default the chart abbreviation if no exception is found but would it be worth the processing to not find a match first or is it faster to just add them all to a table?<br />My concern with using different chart abbreviations than Unicode is that there is no right answer. If someone were to change the Wikipedia chart abbreviation for U+000A from <samp>LF</samp> to <samp>NL</samp> would that be wrong/revertable? What about <samp>LINE</samp>? Or <samp>LFEED</samp>? If we don't have a definitive way to determine the chart abbreviation we open ourselves up to edit wars. Being able to cite the actual Unicode chart gives us one, definitive chart abbreviation.<br />Great work so far, BTW. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 22:10, 17 September 2019 (UTC)
 
::::: Okay, clearly I misinterpreted "I think all alias types should be shown using the Type: ALIAS" to mean "replace more specific alias-type labels with the word ALIAS". Makes a lot more sense with a picture drawn, glad I asked.
::::: So my actual concern about U+206x is that stand-in symbols might be mistaken for the actual glyph '''even by readers otherwise familiar with "normal" control/format character abbreviations''' which consist of multiple capital letters. So some explanatory footnotes might really be needed there.
:::::: '''Agreed'''. My first draft of a note would be "A dashed box indicates characters which normally have no visible display or only modify the display of other characters. {{cite web|title = Dashed Box Convention | url = https://www.unicode.org/versions/Unicode12.0.0/ch24.pdf#G8175 | publisher=Unicode Consortium }}"<br />The citation might be overkill. Although the nuances are pretty complicated so maybe the citation is justified. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 02:04, 18 September 2019 (UTC)
::::: <s>Currently the display text can be overridden from the calling environment (ultimately, a block-specific template) for all assigned codepoints with few restrictions,<ref>Exception: whitespace characters, where the main grid disregards all abbreviations real or fake, instead forcing white-on-green rectangular display of the literal character to show relative size (and allow user to select/copy just like any other printable character). This differs from the source material but seems beneficial enough to justify. So for these codepoints, only in the lower info panel can the display text such as <span style="padding: 2px; border:1px dashed black;">NBSP</span> actually be overridden.</ref> which has been done in the U+206x example (and less constructively in the [[User:BabelStone/sandbox#Basic Latin (with various per-cell customizations)|"Vulgar" Latin]] sandbox section).</s> If we do load a master list of favored abbreviations from a sub-module (containing everything from <code>LF</code> to <code>NULL NOTE HEAD</code>), the <s><code>display_NNNN = FOO</code></s> parameters could be totally deleted.
::::::{{done}} and {{removed}}
::::: ―[[special:contributions/cobaltcigs|cobaltcigs]] 23:14, 17 September 2019 (UTC)
:::::: '''Oops''', I completely forgot about the <s><code>display_NNNN = FOO</code></s> parm. I like the idea of a master list because it centralizes the data but either approach will work. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 02:04, 18 September 2019 (UTC)
::::::: +1 for a master list. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 16:13, 18 September 2019 (UTC)
:::::::: {{done}} ―[[special:contributions/cobaltcigs|cobaltcigs]] 06:42, 19 September 2019 (UTC)
{{reflist-talk}}
 
===Master list complete===
See [[Module:Unicode chart/display]] and make any corrections/amendments as needed. Maybe I missed a few reading all those PDFs. Except for the CJK blocks where even "skimming" would be too generous a term. <s><code>display_NNNN</code></s> params will be whacked soon. ―[[special:contributions/cobaltcigs|cobaltcigs]] 04:38, 19 September 2019 (UTC)
:{{removed}} ―[[special:contributions/cobaltcigs|cobaltcigs]] 06:42, 19 September 2019 (UTC)
::I've reviewed the list and made some changes. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
==Going horizontal==
I've made the utf/html info slide to the right rather than downward when an alias list is present. Seems like a more efficient use of space. Seems to look okay next to the infamous BRAKCET correction, which I've confirmed is the longest string in the alias file. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:05, 18 September 2019 (UTC)
:I don't like the other information forced to the right when there's an alias. It's unexpected and I don't think the savings in vertical space makes up for it. Sorry, it just looks misaligned to me.
:Unrelated to the down vs. side option, I have two comments on the displayed information when you click on a code point:
:First, can we move the hex HTML escape sequence before the decimal one (&#x... / &#...)? I've never understood why someone would go through the trouble of calculating the decimal value of a code point in order to create an HTML escape sequence but maybe that's just me. In any case, having the hex value first would align nicer with the UTF-16 information directly above it. Hopefully the hex usage is more comman anyway so it would make sense putting it first.
:Second, instead of the wording "Introduced in Unicode version x", I'd like to use more precise wording that the source uses.[https://www.unicode.org/Public/UNIDATA/DerivedAge.txt] This wording change seems trivial but it gets around the messy issue of various pre-1.1 characters. If Age is 1.1 (the earliest shown in the file), it would say "Assigned as of Unicode 1.1". Otherwise it would say "Newly assigned in Unicode x". Thanks. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
==Named subsets added==
To more thoroughly address DRMcCreedy's item #13, I've added a way to refer to [[Module:Unicode chart/subsets|pre-defined named subsets]] in lieu of inputting a <code>range</code>. I suppose it may also be feasible to do unions/differences/intersections at some point, if there's a demand for it.
{{unicode chart
| block_name = Enclosed CJK Letters and Months
| link_name = Enclosed CJK Letters and Months
| display_name = Enclosed CJK Letters and Months (Hangul)
| subset = CJK_Letters_Months_Hangul
| info = yes
}}
Also new is the black line indicating skipped rows. Seems like a helpful feature.
 
The block name is also optional now. If omitted, there's no PDF link. But we can still set a display title and a link target for the subject. This would allow greater flexibility in generating a chart that transcends block divisions, such as "all control characters" (the subset name for which could be "special" in that it's generated by a function reading an [[Module:Unicode data/control|existing data file]], rather than hardcoded). But here's a sillier example for now.
 
{{unicode chart
| display_name = Basic Latin (vowels)
| link_name = English phonology#Vowels
| subset = Basic Latin vowels
| info = yes
}}
―[[special:contributions/cobaltcigs|cobaltcigs]] 13:45, 20 September 2019 (UTC)
 
:I'd lean towards a jagged line like a ripped piece of paper but the thick black line is certainly noticable enough for the user to realize something's going on. I would, however, like the notes to say "heavy" or "thick" black line because every row has a "black horizontal line". [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
== Orientation of glyphs for vertical scripts ==
 
For scripts such as [[Template:Unicode chart Mongolian|Mongolian]] and [[Template:Unicode chart Phags-pa|Phags-pa]] which are written in vertical columns, the glyphs in the font have horizontal orientation so that complete runs of horizontal text can be rotated into vertical orientation by a higher level protocol (commonly [[CSS]]). Currently, in our code charts we rotate the glyphs into vertical orientation. This used to match the Unicode code charts, which used to show vertically-oriented glyphs for Mongolian and Phags-pa, but a few years back the editor of the Unicode code charts deliberately changed the Mongolian and Phags-pa code charts to show horizontally-oriented glyphs to reflect how the glyphs are represented at the font level. My question is, should we continue to rotate glyphs in the dynamic Mongolian and Phags-pa charts or should we leave them in horizontal orientation to match the current Unicode code charts? My preference is to rotate into vertical orientation as this matches user expectation (it is how Mongolian and Phags-pa glyphs are presented in books on these scripts). [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 08:12, 28 September 2019 (UTC)
:I don't have a strong preference, although I do think Unicode showing them horizontally seems strange. Vertical seems better. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 17:28, 28 September 2019 (UTC)
 
== Unicode 13.0 ==
 
Unicode 13.0 will be released in March. Can we complete outstanding work on the Unicode chart module by then? Or shall we continue to use the old Unicode chart templates for the Unicode 13 update? [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:16, 10 January 2020 (UTC)
 
== The cell displayed for U+E003B (TAG SEMICOLON) contains a colon in a dashed box instead of a semicolon ==
 
The chart shown on page [[Tags_(Unicode_block)]] shows the various tag characters as their normal version in a dashed box, but the character shown in the box for U+E003B (TAG SEMICOLON) is a colon instead of a semicolon. I'm not quite sure where/how to update the template. [[Special:Contributions/81.107.76.114|81.107.76.114]] ([[User talk:81.107.76.114|talk]]) 00:29, 12 August 2020 (UTC)
 
== Missing end tag for table ==
 
{{Ping|Cobaltcigs|Erutuon}} {{Tl|Unicode chart}} has little usage guidance, and I came to [[Module talk:Unicode chart]] (this very page), which has 6 missing end tags for {{tag|table}}, all associated with {{Tl|Unicode chart}}. So I went to [[Special:WhatLinksHere/Template:Unicode chart|Pages that link to "Template:Unicode chart"]]. There are 6 pages that transclude {{Tl|Unicode chart}}, and they all have missing end tags for {{tag|table}}.
 
So, my request is either abandon this project, or write some usage notes that include how to use it without leaving a missing end tags lint error for {{tag|table}}. —[[User:Anomalocaris|Anomalocaris]] ([[User talk:Anomalocaris|talk]]) 07:32, 8 October 2023 (UTC)
: [[User:Vanisaac|Vanisaac]] mistakenly [[Special:Diff/1168432448|got rid of]] the end of the table (<code>|}</code>) while inserting this module into [[Template:Unicode chart]]. [[User:SWinxy|SWinxy]] [[Special:Diff/1169564617|added it back]], but inside the noinclude tag. I just moved it so that it was transcluded. I'm not sure the module should be in the template at this point because it's still marked as "pre-alpha" and hasn't been worked on since 2019, but I'm not going to try to evaluate that. — [[User:Erutuon|Eru]]·[[User talk:Erutuon|tuon]] 20:48, 8 October 2023 (UTC)
::Ah thank you. I must've thought that Module:Unicode chart somehow emitted a |} upon transclusion of this template, but not when the module was invoked, hence why I put the |} in the noinclude. [[User:SWinxy|SWinxy]] ([[User talk:SWinxy|talk]]) 21:38, 8 October 2023 (UTC)
::[[User:Erutuon|Erutuon]]: Thank you for taking care of this! —[[User:Anomalocaris|Anomalocaris]] ([[User talk:Anomalocaris|talk]]) 22:59, 8 October 2023 (UTC)
 
== Trying again from scratch ==
 
When I stumbled across this (April 2024) [[Template:Unicode chart]] wasn't working and no one seemed to be actively working on it. I sent a message to [[User:Cobaltcigs]] (the last person who edited [[Module:Unicode chart]] and when I didn't hear back, I went ahead and started trying to build by own version in the sandbox. The pages I'm using are:
* '''Lua:''' [[Module:Unicode chart/sandbox]]
* '''CSS:''' [[Template:Unicode chart/sandbox/styles.css]]
* '''Template:''' [[Template:Unicode chart/sandbox]]
 
After a couple days, I've created something that works in the majority of testcases, although there are still some edgecases for unusual characters that still need to be ironed out. You can see my version at:
* '''Testcases:''' [[Template:Unicode chart/sandbox/testcases]]
 
- [[User:Eievie|Eievie]] ([[User talk:Eievie|talk]]) 18:22, 22 April 2024 (UTC)