Module talk:Unicode chart: Difference between revisions

Content deleted Content added
display_NNNN is baleeted
added parameter for named "subset" predefs
Line 38:
# How to handle block-specific formatting? For example [[Template:Unicode chart Javanese]] has a specific height and some of the characters in [[Template:Unicode chart Control Pictures]] use a different font size.
# How to handle character links? Like {{ping|BabelStone}}, I'm not a fan of linking specific characters (but others are). It looks like your code, optionally, will link every character if an article exists, but this could increase the number of linked characters. And many characters aren't linked to the character itself, like U+2245 in [[Template:Unicode chart Mathematical Operators]]. Some link to wikt, like U+0x2105 in [[Template:Unicode chart Letterlike Symbols]] and all the characters in [[Template:Unicode chart CJK Unified Ideographs Extension A]].
# {{done}} Some blocks have special parameters that need to be taken into account: [[Template:Unicode chart Alphabetic Presentation Forms]], [[Template:Unicode chart Enclosed Alphanumeric Supplement]], [[Template:Unicode chart Enclosed CJK Letters and Months]], [[Template:Unicode chart Halfwidth and Fullwidth Forms]], [[Template:Unicode chart Miscellaneous Symbols]], and [[Template:Unicode chart Supplemental Symbols and Pictographs]]. As with most of these questions, this only only applies if you're replacing existing chart templates.
# How to determine the chart name? Most charts use the block name for the title but some don't. For example, "C0 Controls and Basic Latin" is the chart name for the "Basic Latin" block.
# How to determine what to link the chart name to. For example, the [[Template:Unicode chart Kangxi Radicals]] chart links to "Kangxi radical#Unicode". Most either link to the block name itself or the block name with "(Unicode block)" appended.
Line 72:
*8 The "Dashed Box Convention" is explained at https://www.unicode.org/versions/Unicode12.0.0/ch24.pdf#G8175 It's an oversight not having a note explaing this convention. It was added to match Unicode's charts. I think it's useful. Depending on the font, without the dashed box U+0602 is easily confusable with U+060E, U+1F1E6 looks the same as captial A, etc. As far as I know there's no way to determine which characters get a dashed box programmatically. As of version 12.1 it's used on U+0000-0020, 007F-00A0, 00AD, 034F, 0600-0605, 061C, 06DD, 070F, 08E2, 0CF1-0CF2, 0D4E, 0F0C, 1039, 115F-1160, 17B4-17B5, 17D2, 180B-180E, 1A60, 1BAB, 1CF5-1CF6, 2000-200F, 2011, 2028-202F, 205F-2064, 2066-206F, 2D7F, 2E3A-2E3B, 3000, 303E, 3164, AAF6, FE00-FE0F, FEFF, FFA0, FFF9-FFFB, 10A3F, 11003-11004, 1107F, 110BD, 110CD, 111C2-111C3, 11A3A, 11A47, 11A84-11A89, 11A99, 11D45-11D46, 11D97, 13430-13438, 16F8F-16F92, 1BC9D, 1BCA0-1BCA3, 1D159, 1D173-1D17A, 1DA9B-1DA9F, 1DAA1-1DAAF, 1F1E6-1F1FF, E0001, E0020-E007F, and E0100-E01EF.
*10 Unicode charts use XXX (in a dotted box) for U+0080, 0081, and 0099 and I don't think Wikipedia's charts should contradict the cited source. (For some archane history of these three characters, I recommend http://unicode.org/pipermail/unicode/2015-October/002876.html) I think the only way of determining the abbreviations to use in the charts is a hardcoded table. They don't always match an alias. For example U+E007F is displayed as "END". A lot of the code points that use the dashed box convention display abbreviations. I haven't compiled a definitive list.
*{{done}} 13 In [[Template:Unicode chart Enclosed CJK Letters and Months]] the hangul subset isn't contiguous. Nor is the emoticon subset of [[Template:Unicode chart Miscellaneous Symbols]]. I didn't add these features so I don't know what reaction you'll get from removing them.
[[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 23:09, 10 September 2019 (UTC)
 
Line 220:
==Going horizontal==
I've made the utf/html info slide to the right rather than downward when an alias list is present. Seems like a more efficient use of space. Seems to look okay next to the infamous BRAKCET correction, which I've confirmed is the longest string in the alias file. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:05, 18 September 2019 (UTC)
 
==Named subsets added==
To more thoroughly address DRMcCreedy's item #13, I've added a way to refer to [[Module:Unicode chart/subsets|pre-defined named subsets]] in lieu of inputting a <code>range</code>. I suppose it may also be feasible to do unions/differences/intersections at some point, if there's a demand for it.
{{unicode chart
| block_name = Enclosed CJK Letters and Months
| link_name = Enclosed CJK Letters and Months
| display_name = Enclosed CJK Letters and Months (Hangul)
| subset = CJK_Letters_Months_Hangul
| info = yes
}}
Also new is the black line indicating skipped rows. Seems like a helpful feature.
 
The block name is also optional now. If omitted, there's no PDF link. But we can still set a display title and a link target for the subject. This would allow greater flexibility in generating a chart that transcends block divisions, such as "all control characters" (the subset name for which could be "special" in that it's generated by an algorithm, even). But here's a sillier example for now.
 
{{unicode chart
| display_name = Basic Latin (vowels)
| link_name = English phonology#Vowels
| subset = Basic Latin vowels
| info = yes
}}
―[[special:contributions/cobaltcigs|cobaltcigs]] 13:45, 20 September 2019 (UTC)