Module talk:Unicode chart: Difference between revisions

Content deleted Content added
another idea
Existing charts: follow-up
Line 114:
</pre>
The inner loop of the module could check whether such parameter exists and know whether to behave differently. This should be used sparingly but presents a decent solution for exceptional cases, which will be absent from most blocks. ―[[special:contributions/cobaltcigs|cobaltcigs]] 20:21, 10 September 2019 (UTC)
 
I have some follow-up:
*5 The following blocks have specific footnotes: [[Template:Emoji (Unicode block)]], [[Template:Unicode chart Hangul Jamo]], [[Template:Unicode chart Superscripts and Subscripts]], and [[Template:Unicode chart Sutton SignWriting]]. Additionally, blocks with non-characters have the "Black areas indicate noncharacters (code points that are guaranteed never to be assigned as encoded characters in the Unicode Standard)" footnote: [[Template:Unicode chart Arabic Presentation Forms-A]] and [[Template:Unicode chart Specials]]. And these blocks have deprecated notes: [[Template:Unicode chart General Punctuation]], [[Template:Unicode chart Khmer]], [[Template:Unicode chart Miscellaneous Technical]], [[Tags (Unicode block)]], and [[Template:Unicode chart Tibetan]].
*6 There are only 66 non-characters (https://www.unicode.org/faq/private_use.html#nonchar3) and Unicode has promised not to add any more. I think the black background is effective and would want to keep it. I think it's safer not to put non-characters themselves into the charts as they are "not normally interchanged with other users" (https://www.unicode.org/faq/private_use.html#nonchar2). The code points are U+FDD0-FDEF, FFFE-FFFF, 1FFFE-1FFFF, 2FFFE-2FFFF, 3FFFE-3FFFF, 4FFFE-4FFFF, 5FFFE-5FFFF, 6FFFE-6FFFF, 7FFFE-7FFFF, 8FFFE-8FFFF, 9FFFE-9FFFF, AFFFE-AFFFF, BFFFE-BFFFF, CFFFE-CFFFF, DFFFE-DFFFF, EFFFE-EFFFF, FFFFE-FFFFF, and 10FFFE-10FFFF.
*8 The "Dashed Box Convention" is explained at https://www.unicode.org/versions/Unicode12.0.0/ch24.pdf#G8175 It's an oversight not having a note explaing this convention. It was added to match Unicode's charts. I think it's useful. Depending on the font, without the dashed box U+0602 is easily confusable with U+060E, U+1F1E6 looks the same as captial A, etc. As far as I know there's no way to determine which characters get a dashed box programmatically. As of version 12.1 it's used on U+0000-0020, 007F-00A0, 00AD, 034F, 0600-0605, 061C, 06DD, 070F, 08E2, 0CF1-0CF2, 0D4E, 0F0C, 1039, 115F-1160, 17B4-17B5, 17D2, 180B-180E, 1A60, 1BAB, 1CF5-1CF6, 2000-200F, 2011, 2028-202F, 205F-2064, 2066-206F, 2D7F, 2E3A-2E3B, 3000, 303E, 3164, AAF6, FE00-FE0F, FEFF, FFA0, FFF9-FFFB, 10A3F, 11003-11004, 1107F, 110BD, 110CD, 111C2-111C3, 11A3A, 11A47, 11A84-11A89, 11A99, 11D45-11D46, 11D97, 13430-13438, 16F8F-16F92, 1BC9D, 1BCA0-1BCA3, 1D159, 1D173-1D17A, 1DA9B-1DA9F, 1DAA1-1DAAF, 1F1E6-1F1FF, E0001, E0020-E007F, and E0100-E01EF.
*10 Unicode charts use XXX (in a dotted box) for U+0080, 0081, and 0099 and I don't think Wikipedia's charts should contradict the cited source. (For some archane history of these three characters, I recommend http://unicode.org/pipermail/unicode/2015-October/002876.html) I think the only way of determining the abbreviations to use in the charts is a hardcoded table. They don't always match an alias. For example U+E007F is displayed as "END". A lot of the code points that use the dashed box convention display abbreviations. I haven't compiled a definitive list.
*13 In [[Template:Unicode chart Enclosed CJK Letters and Months]] the hangul subset isn't contiguous. Nor is the emoticon subset of [[Template:Unicode chart Miscellaneous Symbols]]. I didn't add these features so I don't know what reaction you'll get from removing them.
[[User:Drmccreedy|DRMcCreedy]] ([[User talk:Drmccreedy|talk]]) 23:09, 10 September 2019 (UTC)