Module talk:Unicode chart: Difference between revisions

Content deleted Content added
mNo edit summary
control chars, font problem explained
Line 2:
 
==Notes about notes==
{{collapsible section
| title = from [[Special:Permanentlink/913360801#Unicode code chart template -- expandable?|User_talk:Drmccreedy]]
| content=
Hi, one thing which I have been thinking about for a long time (several years) is to make the Unicode code chart templates expandable to show a list of all character names (and formal character name aliases). I think this would be very helpful to users as at present the only way to know what the character name is is to hover the mouse over the character cell whilst carefully avoiding hovering over the link that people so love to add to the characters; but the mouseover text is not copyable, so it is of limited use. I have made a rough mock up of what I mean in [[User:BabelStone/sandbox|my sandbox]]. What do you think? Please feel free to tweak or improve it. (I suggest that this approach is not applied for large blocks with algorithmic names). [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 20:42, 22 August 2019 (UTC)
:{{reply to|BabelStone}} I think it's definitely doable if you think it's useful. And if you've been thinking about it for that long it's probably useful.
:* I changed the "Character names" title to "List of character names" to be painfully clear.
:::It is better.
:* Should the list be sortable? This involves adding a header. I've mocked it up in your sandbox. I'd skip this on the algorithmic ones though because the code point and name always sort the same.
:::Personally, I don't think sortable is particularly useful, but I don't mind.
:* I'm assuming aliases will use the same format as the current charts: FOO (alias BAR)
:::Seems reasonable.
:* Can we agree that there should be NO LINKED CHARACTERS in that lists? If someone wants to link each character they can do so in the existing part of the chart as far as I'm concerned. [[Latin Extended-B]] is an example of this.
:::I full agree that there should be no links in the names list.
:* There's likely to be some duplication between the template and the article text. [[Latin Extended-B]] again is a good example. I'm thinking that article text with a list of characters can be removed once this is in place so long as they don't add additional information. (I would count the decimal values provided in Latin Extended-B as not adding information except to anyone who doesn't know you can use &#xHHHH; notation.)
:::Yes.
:* Lastly, do we need to worry about added character counts for articles that include multiple charts? Could this cause them to exceed size limits?
:::Probably not because articles with multiple code chart templates are generally not for very large blocks, and the huge blocks with algorithmic names will only have a slight increase in size.
: [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy#top|talk]]) 22:24, 22 August 2019 (UTC)
::I definitely think it is useful. If users want an overview of the character names, at present they have to click on the link to Unicode code charts or go to another website. (Other replies inline above) [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 10:53, 23 August 2019 (UTC)
:::I made the ogham table sortable, but when you sort by code point it does not sort in the expected order (hex values with A..F are sorted separately from hex values comprising 0..9 only). We could overcome this by putting the code point in a {{template|sort}} template with a fixed width decimal value for the hidden sort parameter, but this seems like too much trouble for a marginally useful feature. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 11:06, 23 August 2019 (UTC)
::::In light of that, let's ditch sorting. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy#top|talk]]) 16:07, 23 August 2019 (UTC)
:::::Agreed. Here are a few more comments and questions I have before we start implementing the change to three hundred templates [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 16:55, 23 August 2019 (UTC)
:::::* For blocks with algorithmic character names I think best to only list first and last assigned characters in the block. I currently put "..." between the two rows -- is that OK, or is there a better way of indicating omission of the intervening rows?
::::::I noticed that and thought it was intuitive.
:::::* Many or most blocks have hard-coded fonts applied (in the template or using css) to the code chart glyphs (which I personally don't like). For the names list it is useful to put the character after the code point, but I don't want to hard-code the fonts to use, so I was thinking of not specifying fonts for the names list part of the table. What do you think?
::::::I'm OK with this but anticipate others will want to add font info. I'd say let's leave font info off for now and see if there's push back.
:::::* Do we want to add any other core data for the characters? For example, we could provide a column for general category or script. Is that perhaps overkill?
::::::I thought of that too. Probably overkill. My concern is there's almost no end of info we could add.
:::::* Should the List of character names go above or below the Notes? I'm happy with current placement below the notes, but maybe it makes more sense to put the notes at the very bottom.
::::::I like the notes at the very bottom logically, but the list is probably easier to spot if we don't wedge it between the chart and the notes. So let's leave the list as the last item.
{{reply to|BabelStone}} [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy#top|talk]]) 17:07, 23 August 2019 (UTC)
:::::::Thanks for all the feedback. I think we're about there now, but I don't want to rush into making quite a large change to a large number of templates, so I'll sit on it for a week or so in case you or me or anyone else has any suggestions for improving how we do it. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 20:05, 23 August 2019 (UTC)
::::::::Sounds good. The only other question that's popped into my head is combining characters. Often in the chart we'll use a dotted circle (◌) or a space with them. I'm thinking if the purpose of the table is copy-and-paste, maybe we should skip that. Not sure I feel strongly either way but that should be nailed down before the charts are created. [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy#top|talk]]) 20:43, 23 August 2019 (UTC)
:::::::::I've added an example for a block with combining characters (Combining Diacritical Marks for Symbols), with plain characters for the first row and prefixed with nbsp for the second row. The unprefixed characters do not look good as they straddle the code point column, so I think prefixing with nbsp is best (I don't like the dotted circle as that often interferes with the combining mark, and makes it difficult to see clearly). [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 11:24, 24 August 2019 (UTC)
:::::::::I've also added [[User:BabelStone/sandbox#Example_with_a_formal_name_alias|an example with a character name alias]]. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 12:58, 24 August 2019 (UTC)
::::::::::Looks good. I like the linked "alias". [[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy#top|talk]]) 15:56, 24 August 2019 (UTC)
}}
 
===Comments===
* I'm not convinced the "Notes" section at the bottom is worth the space it takes up, and I only added it as a proof-of-concept gesture to mimic existing layout convention. A collapsible (show/hide, just like the section above) section at the bottom with an additional list/table of character info (one per line) would certainly be feasible and only require a few more lines of code. Its hugeness of screen space would be the primary concern, because its expansion would displace other page content possibly including wrapped text or floating images (unlike navboxes, which occupy 100% width at the very bottom).
*:We should just give first and last rows for blocks with character names derived from code points (CJK, Tangut, Nushu, ...), so the largest block is Hangul Syllables with 11,184 code points, which I agree is too long for this approach. But the next biggest blocks are Yi Syllables (1,168), Egyptian Hieroglyphs (1,072), Mathematical Alphanumeric Symbols (1,024), and Cuneiform (1,024), which I think should be acceptable if the names list is initially hidden. I don't see that displacement of other text and images would be an issue, especially as the code charts are mostly only used in the corresponding Unicode Block name articles. [[User:BabelStone|BabelStone]] ([[User talk:BabelStone|talk]]) 11:33, 10 September 2019 (UTC)
Line 52 ⟶ 13:
―[[special:contributions/cobaltcigs|cobaltcigs]] 17:55, 9 September 2019 (UTC)
 
===Update/to-do===
''See [[Template:Unicode chart/testcases]].''
*I've reduced the number of required parameters to only the name of the block and the version string. In reality, the former can probably be deduced (from the name of the calling template), and the latter should be exposed by [[Module:Unicode data]] in some fashion (to avoid hard-coding 12.0 on any other page) and should be updated as frequently as the data subpages are updated.
Line 133 ⟶ 94:
* Black blocks were actually easy to detect. Previous code assumed anything containing "<" was <code><reserved-NNNN></code> when it can actually be <code><noncharacter-NNNN></code> or <code><control-NNNN></code>. Whoops. It's all right there in [[Module:Unicode data]]. Will work on control chars next.
* I've discovered [[Module:Unicode data/aliases]] includes (among other things) abbreviations for control characters. It does in fact use PAD and HOP.
* I gave the control characters a light blue background and an explanatory footnote similar to those for RESERVED and NONCHARACTER. Also dashed boxes around the abbreviations, which are loaded from [[Module:Unicode data/aliases|here]]. Some have multiple abbreviations. The current behavior is to choose the last one, because at brief glance that seemed most correct in most cases. I'd rather we move the "official" or preferred abbreviation to the top and consistently select the first one instead. I've yet to research what, if anything, might be broken by changing abbreviation order.
―[[special:contributions/cobaltcigs|cobaltcigs]] 1609:2717, 1112 September 2019 (UTC)
 
==The font problem, explained==
The only way to load custom css definitions is through the <code><[[mw:Help:TemplateStyles|templatestyles]] src="Template:Something/something.css" /></code> extension tag. This can be produced in the module by preprocessing the previous wikitext/pseudo-html, or by using <code>frame:extensionTag{ name = 'templatestyles', args = { src = '...'} }</code> to the same effect. Either way, the <code>src</code> page must be of "content model: Sanitized CSS" meaning it must be in the template namespace and have a title ending with ".css" which puts you in a mode that checks for syntax errors and disallows the use of templates, modules, parserFunctions, or anything other than hard-coded css (with a few features excluded for security reasons).
 
In practice that means there's no way for a template/module parameter such as <code>| font = font-family: 'DejaVu Sans', 'FreeSans', 'Lucida Sans Unicode'; font-size: 1.25em;</code> (or for any string of text obtained or composed at module runtime) to create a reusable css <code>class</code>. So any user-supplied font specs would need to be hard-coded as a <code>style</code> attribute to be used at all. Workarounds include, in descending order of sloppiness:
# Duplicating that much code in the <code>style</code> attribute of every single <code>td</code> cell (which would be stupid as hell).
# Assigning the bulky <code>style="..."</code> crap one time only to the root <code>table</code> element, then having the <code>th { ... }</code> css (conveniently everything that's not a character cell <code>td</code> is a <code>th</code>) [[Template:Unicode_chart/styles.css|loaded from here]] attempt to negate any foreseeable user input back to the default so that the table's style attribute appears to only affect the <code>td</code> (codepoint grid) cells. This would be very difficult to do well, considering the defaults we'd seek to revert to could differ according to user skin and other environmental factors.
# Continue using {{tl|script}} within each cell and suffer its inefficiency and incompleteness.
# Placing [https://pastebin.com/raw/P1AE8RzP this much css (more to be added later)] on a single acceptable css source page, then it import via <code>templatestyles</code>.
# Make a better version of [[Template:Script]] by dividing the css into 154 one-liner subpages of CSS, each named to reflect the ISO 15924 code, and imported only when the need for it is detected (using [[Module:Unicode data/scripts|this]]). Needing more than one in the same table will most likely be rare, so the question of how many small loads are processor-equal to one big load is probably not even worth testing.
# Avoid forking and turn the original Template:Script into what we want (use consistent names, include everything, and use a module instead of the switch statement and sub-template spaghetti logic).
I'm prepared to go with #4 for now, then upgrade to #5–6 only after all the other issues are addressed. ―[[special:contributions/cobaltcigs|cobaltcigs]] 09:17, 12 September 2019 (UTC)