Module talk:Unicode chart

This is the talk page for discussing improvements to the Unicode chart module.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Notes about notes

Latest comment: 5 years ago15 comments3 people in discussion

Hi, one thing which I have been thinking about for a long time (several years) is to make the Unicode code chart templates expandable to show a list of all character names (and formal character name aliases). I think this would be very helpful to users as at present the only way to know what the character name is is to hover the mouse over the character cell whilst carefully avoiding hovering over the link that people so love to add to the characters; but the mouseover text is not copyable, so it is of limited use. I have made a rough mock up of what I mean in my sandbox. What do you think? Please feel free to tweak or improve it. (I suggest that this approach is not applied for large blocks with algorithmic names). BabelStone (talk) 20:42, 22 August 2019 (UTC)Reply

@BabelStone: I think it's definitely doable if you think it's useful. And if you've been thinking about it for that long it's probably useful.

I changed the "Character names" title to "List of character names" to be painfully clear.

It is better.

Should the list be sortable? This involves adding a header. I've mocked it up in your sandbox. I'd skip this on the algorithmic ones though because the code point and name always sort the same.

Personally, I don't think sortable is particularly useful, but I don't mind.

I'm assuming aliases will use the same format as the current charts: FOO (alias BAR)

Seems reasonable.

Can we agree that there should be NO LINKED CHARACTERS in that lists? If someone wants to link each character they can do so in the existing part of the chart as far as I'm concerned. Latin Extended-B is an example of this.

I full agree that there should be no links in the names list.

There's likely to be some duplication between the template and the article text. Latin Extended-B again is a good example. I'm thinking that article text with a list of characters can be removed once this is in place so long as they don't add additional information. (I would count the decimal values provided in Latin Extended-B as not adding information except to anyone who doesn't know you can use &#xHHHH; notation.)

Yes.

Lastly, do we need to worry about added character counts for articles that include multiple charts? Could this cause them to exceed size limits?

Probably not because articles with multiple code chart templates are generally not for very large blocks, and the huge blocks with algorithmic names will only have a slight increase in size.

DRMcCreedy (talk) 22:24, 22 August 2019 (UTC)Reply

I definitely think it is useful. If users want an overview of the character names, at present they have to click on the link to Unicode code charts or go to another website. (Other replies inline above) BabelStone (talk) 10:53, 23 August 2019 (UTC)Reply

I made the ogham table sortable, but when you sort by code point it does not sort in the expected order (hex values with A..F are sorted separately from hex values comprising 0..9 only). We could overcome this by putting the code point in a {{sort}} template with a fixed width decimal value for the hidden sort parameter, but this seems like too much trouble for a marginally useful feature. BabelStone (talk) 11:06, 23 August 2019 (UTC)Reply

In light of that, let's ditch sorting. DRMcCreedy (talk) 16:07, 23 August 2019 (UTC)Reply

Agreed. Here are a few more comments and questions I have before we start implementing the change to three hundred templates BabelStone (talk) 16:55, 23 August 2019 (UTC)Reply

For blocks with algorithmic character names I think best to only list first and last assigned characters in the block. I currently put "..." between the two rows -- is that OK, or is there a better way of indicating omission of the intervening rows?

I noticed that and thought it was intuitive.

Many or most blocks have hard-coded fonts applied (in the template or using css) to the code chart glyphs (which I personally don't like). For the names list it is useful to put the character after the code point, but I don't want to hard-code the fonts to use, so I was thinking of not specifying fonts for the names list part of the table. What do you think?

I'm OK with this but anticipate others will want to add font info. I'd say let's leave font info off for now and see if there's push back.

Do we want to add any other core data for the characters? For example, we could provide a column for general category or script. Is that perhaps overkill?

I thought of that too. Probably overkill. My concern is there's almost no end of info we could add.

Should the List of character names go above or below the Notes? I'm happy with current placement below the notes, but maybe it makes more sense to put the notes at the very bottom.

I like the notes at the very bottom logically, but the list is probably easier to spot if we don't wedge it between the chart and the notes. So let's leave the list as the last item.

@BabelStone: DRMcCreedy (talk) 17:07, 23 August 2019 (UTC)Reply

Thanks for all the feedback. I think we're about there now, but I don't want to rush into making quite a large change to a large number of templates, so I'll sit on it for a week or so in case you or me or anyone else has any suggestions for improving how we do it. BabelStone (talk) 20:05, 23 August 2019 (UTC)Reply

Sounds good. The only other question that's popped into my head is combining characters. Often in the chart we'll use a dotted circle (◌) or a space with them. I'm thinking if the purpose of the table is copy-and-paste, maybe we should skip that. Not sure I feel strongly either way but that should be nailed down before the charts are created. DRMcCreedy (talk) 20:43, 23 August 2019 (UTC)Reply

I've added an example for a block with combining characters (Combining Diacritical Marks for Symbols), with plain characters for the first row and prefixed with nbsp for the second row. The unprefixed characters do not look good as they straddle the code point column, so I think prefixing with nbsp is best (I don't like the dotted circle as that often interferes with the combining mark, and makes it difficult to see clearly). BabelStone (talk) 11:24, 24 August 2019 (UTC)Reply

I've also added an example with a character name alias. BabelStone (talk) 12:58, 24 August 2019 (UTC)Reply

Looks good. I like the linked "alias". DRMcCreedy (talk) 15:56, 24 August 2019 (UTC)Reply

Comments

I'm not convinced the "Notes" section at the bottom is worth the space it takes up, and I only added it as a proof-of-concept gesture to mimic existing layout convention. A collapsible (show/hide, just like the section above) section at the bottom with an additional list/table of character info (one per line) would certainly be feasible and only require a few more lines of code. Its hugeness of screen space would be the primary concern, because its expansion would displace other page content possibly including wrapped text or floating images (unlike navboxes, which occupy 100% width at the very bottom).
One intuitive solution would be to mimic typical charmap program behavior by using a Javascript click handler on each character cell that populates the footer area (of about the same size as the "Notes" section, maybe slightly smaller) with the cursor-selectable name of the last clicked-upon codepoint, plus its &escapecode; and any additional info we care to pull from Module:Unicode data (replacing any previous content). I could whip up a demo for that in the next few days. I just worry that it might be too interactive to be widely accepted.
A third approach might be to render the entire list (of names and whatnot) in a vertically scrollable footer panel containing "section" links, such that clicking on the character cell would cause the footer to scroll to and highlight (similar behavior to reflist anchors) the appropriate line. This might be even less popular.
On the other hand, some philosophies may have changed over the years. I mean, we do have interactive scrolling maps that pop up in a fullscreen div now (see example).
I haven't formed any opinion yet on how to handle combining character positioning, other than "oh god, I hope it's something other than  " lol.

―cobaltcigs 17:55, 9 September 2019 (UTC)Reply

Update/to-do

See Template:Unicode chart/testcases.

I've reduced the number of required parameters to only the name of the block and the version string. In reality, the former can probably be deduced (from the name of the calling template), and the latter should be exposed by Module:Unicode data in some fashion (to avoid hard-coding 12.0 on any other page) and should be update as frequently as the data subpages are updated.
I've got it looking up the ISO 15924 and using that to select a <span> from Template:Script containing a css class for an appropriate font-family. Better would be a way to apply the class and dir attributes directly to the <td> element.
Start/end codepoints still exist as an option. The looked-up values can be overriden to subdivide a large block without confusing the module.
~~I need to debug out why it gives an error at line 38: bad argument #2 to 'format' (string expected, got nil) but only for some block names.~~
- It was because the Module:Unicode data/scripts.ranges table skips certain chars, including the Ⴧ and Ⴭ in Georgian. Added a workaround. ―cobaltcigs 22:10, 9 September 2019 (UTC)Reply

―cobaltcigs 20:49, 9 September 2019 (UTC)Reply

Existing charts

Latest comment: 5 years ago1 comment1 person in discussion

Interesting approach to create the Unicode code charts dynamically but I have many questions. Most only apply if this module is intended to replace the existing chart templates...

What problem is this new approach solving? Is it just duplicating/replacing the existing templates? If not, what will this module be used for?
Do the charts get created every time they're displayed? If so, do we care about the extra processing incurred?
How to handle fonts? I saw the post at Template talk:Script#Module:Unicode chart and the notes above so I know this is a known issue.
How to handle a varying number of reserved characters? The current charts leave off the "Gray areas" notice if there are no non-assigned code points because having the "gray areas" notice for those blocks would be confusing. And the wording changes if there is only one non-assigned code point.
How to handle charts with additional footnotes? For example, Template:Unicode chart Arabic. And for the existing charts, the notes are indeed valuable.
How to handle non-characters? For example, U+FDD0-FDEF in Template:Unicode chart Arabic Presentation Forms-A.
How to handle combining marks (which are referenced above)? Some charts have special additions for some combining characters. For example, U+A980 in Template:Unicode chart Javanese uses a dotted circle. Other combining marks, like U+1D242 in Template:Unicode chart Ancient Greek Musical Notation use a non-breaking space. Some combining marks use no additional character at all.
How to handle characters with dashed boxes? For example, U+0600-0605, 061C, and 06DD in the Template:Unicode chart Arabic chart.
How to handle control(ish) characters where we don't want the actual character in the chart? For example, U+061C in the Template:Unicode chart Arabic chart, and more obviously, control characters in Template:Unicode chart C0 Controls and Basic Latin and Template:Unicode chart C1 Controls and Latin-1 Supplement.
How to create character name aliases? See U+061C in Template:Unicode chart Arabic and the control characters in Template:Unicode chart C0 Controls and Basic Latin and Template:Unicode chart C1 Controls and Latin-1 Supplement.
How to handle block-specific formatting? For example Template:Unicode chart Javanese has a specific height and some of the characters in Template:Unicode chart Control Pictures use a different font size.
How to handle character links? Like @BabelStone:, I'm not a fan of linking specific characters (but others are). It looks like your code, optionally, will link every character if an article exists, but this could increase the number of linked characters. And many characters aren't linked to the character itself, like U+2245 in Template:Unicode chart Mathematical Operators. Some link to wikt, like U+0x2105 in Template:Unicode chart Letterlike Symbols and all the characters in Template:Unicode chart CJK Unified Ideographs Extension A.
Some blocks have special parameters that need to be taken into account: Template:Unicode chart Alphabetic Presentation Forms, Template:Unicode chart Enclosed Alphanumeric Supplement, Template:Unicode chart Enclosed CJK Letters and Months, Template:Unicode chart Halfwidth and Fullwidth Forms, Template:Unicode chart Miscellaneous Symbols, and Template:Unicode chart Supplemental Symbols and Pictographs. As with most of these questions, this only only applies if you're replacing existing chart templates.
How to determine the chart name? Most charts use the block name for the title but some don't. For example, "C0 Controls and Basic Latin" is the chart name for the "Basic Latin" block.
How to determine what to link the chart name to. For example, the Template:Unicode chart Kangxi Radicals chart links to "Kangxi radical#Unicode". Most either link to the block name itself or the block name with "(Unicode block)" appended.
Will the new approach be used for the list charts that make up List of CJK Unified Ideographs, part 1 of 4 and List of CJK Unified Ideographs Extension B (Part 1 of 7)?

DRMcCreedy (talk) 04:51, 10 September 2019 (UTC)Reply