Revision as of 00:10, 19 October 2021 edit Lowercase sigmabot III (talk \| contribs) Bots, Page movers 2,449,404 edits m Archiving 1 discussion(s) from Talk:Unicode) (bot ← Previous edit		Revision as of 00:10, 20 October 2021 edit undo Lowercase sigmabot III (talk \| contribs) Bots, Page movers 2,449,404 edits m Archiving 1 discussion(s) from Talk:Unicode) (bot Next edit →
Line 232: FYI: I have created article '''[[Unicode alias names and abbreviations]]'''. IMO it is very complete, both wrt formal aliases (the 5 reasons), and the informal ones (used by Unicode e.g. in charts, but not formalised/listed). Personally: I have been working on this a long time to get it right (in age not time spend ;-) ). Mainly to get a complete list of abbr's-in-unicode. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 00:57, 19 October 2019 (UTC) == What is a Unicode font? == ''Google search has already picked this up so lest skim readers be misled I have stricken out proposals that I have withdrawn.'' --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 11:24, 15 October 2019 (UTC) {{ping\|Peter M. Brown}}, {{ping\|LiberatorG}}, {{ping\|Drmccreedy}} I wrote <s>{{quote\|text=The term 'Unicode font' is used more specifically to categorise those fonts that have implementations for every character in the repertoire (or at least a large (65,535) subset of it.}}</s> I am very open to suggestions about how better to say this. First of all, I take it that my ''Unicode is not in principle concerned with fonts ''per se'', seeing them as implementation choices. Any given character may have many [[allograph]]s, from the more common bold, italic and base letterforms to complex decorative styles. '' is not disputed. A [[font]] (according to that article) is "was a particular size, weight and style of a [[typeface]] in hot-metal typesetting" and in modern terminology can be taken as synonymous with a typeface. So, it seems to me, a "Unicode font" is literally a contradiction in terms: Unicode is a database of numeric values associated with letterforms no matter how drawn, it is not a font or a typeface. The font expresses in vectors how an an artist (typographer) has chosen to draw a letterform associated with that number. But the industry has adopted the term "Unicode font" to mean a font that has at least most of the characters in the basic plane. Is there a better way to express that succinctly than my proposal? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 23:21, 11 October 2019 (UTC) :No, the "industry" (whatever that is) has not adopted the term "Unicode font" to mean a font that has at least most of the characters in the basic plane. A Unicode font is simply a computer font that has a [[Cmap (font)\|cmap]] table that maps glyph IDs to Unicode code points. The term "pan-Unicode" is sometimes used to refer to a Unicode font that covers a high proportion of Unicode characters, but given that TTF/OTF fonts have a maximum limit of 65,535 glyphs, and Unicode 12.1 defines 137,766 graphic characters, it is nigh on impossible for a font to cover all Unicode characters. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 23:32, 11 October 2019 (UTC) ::I had hoped that your reference to Cmap would point the way to an improved definition, but unfortunately it just takes us around in a circular argument. ''"defines the mapping of [[Character encoding\|character codes]] to the [[glyph#Typography\|glyph]] index values used in the font."''<ref>{{Cite web\|url=https://docs.microsoft.com/en-us/typography/opentype/spec/cmap\|title=cmap – Character To Glyph Index Mapping Table – Typography\|website=docs.microsoft.com}}</ref> I accept that my definition is uncited (though I have great difficulty accepting a font with less than 2000 glyphs as a "Unicode font" - just because it calls itself a Unicode font surely doesn't make it one. Is there an RS definition anywhere?--[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 23:41, 11 October 2019 (UTC) {{reflist\|talk}} :::It is not something as basic as a font that supports multi-byte representation (rather than single byte), is it? (8-but bytes)--[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 00:04, 12 October 2019 (UTC) ::: A Unicode font could mean a lot of things; why wouldn't a font that supports glyphs indexed by Unicode point be called a Unicode font? Certainly if we were talking about a script supported by fonts that put their glyphs over the ASCII range, a font that used instead Unicode code points could productively be called a Unicode font. I certainly wouldn't talk about number of glyphs; a GB 2312 font with over 7,000 characters is less of a Unicode font than a Latin-Greek-Cyrillic font that richly supports all features only Unicode offers.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 10:38, 12 October 2019 (UTC) ::::I think you've fallen into the same trap as I did, that it is not a "proper" Unicode font unless it has a substantial repertoire. That is a qualitative judgement rather than a functionality judgement. ''[I have accepted that my original text was incorrect, btw. But something like it needs to be said, the question now is what?]'' According to the (uncited!) statement that opens the [[Unicode fonts]] article, {{quote\|A '''Unicode font''' is a [[computer font]] that maps [[glyph]]s to [[Unicode character]]s (i.e. the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]).}} ::::So if that is literally correct, a font that has no more than the most basic ASCII character-set would qualify as a Unicode font if "the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]". Is this true? Why? Why not? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 10:51, 12 October 2019 (UTC) ::::: Before Unicode we used fonts with a maximum of <256 glyphs. When we mixed scripts we had one font for Latin, another one for Cyrillic, yet another one for Greek, one for Hebrew, etc. (The encodings used are now known as "[[Character encoding#Character sets, character maps and code pages\|legacy encodings]].") So different characters were mapped to the same code point, and formatting was essential for a text to make sense. (However my word processor sometimes "lost formatting," which resulted in gibberish, or ''[[mojibake]]''.) Unicode was revolutionary as formatting was no longer needed for texts to make sense (see article [[Plain text]]), and we actually called any font containing a table to map gylphs to Unicode code points a Unicode font. This even applied to Latin-only fonts, because it was not clear which script a font handled, and even if a font was known to be a "Latin font," the range above (<128-character) ASCII was still legacy-encoded, so non-basic Latin letters like the {{angbr\|é}} in {{angbr\|[[résumé]]}} were mapped to different code points by different software developers. <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 14:34, 12 October 2019 (UTC) ::::::Yes, I knew that, see [[ISO Latin-1]], [[ISO Latin 2]] etc. I'm afraid this all reminds me of the [[HD Ready]] scam. So let me be provocative and propose a new draft second sentence. For convenience, I'll open a new subsection. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 15:05, 12 October 2019 (UTC) ::::: Not necessarily substantial, but something that's uniquely Unicode. ::::: You're imagining that Unicode font means something clear and unambiguous. It doesn't. Certainly, a font for a script might be called a Unicode font if there were a previous tradition of non-Unicode-compatible fonts, even if it only covered a 7-bit charset. But Unicode font would generally only be a marketing term.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 05:53, 13 October 2019 (UTC) ::::::''But Unicode font would generally only be a marketing term.'' Precisely and that is what I believe that the article should say explicitly. ''[[Caveat emptor]]'' presupposes an informed emptor. As well as the article telling readers what Unicode is, it should also tell them what it is not. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:19, 13 October 2019 (UTC) ::::::: I don't see any need to change the current section at all. Saying what something is not is rarely very productive; Unicode is not a fox, it's not a box, it is not rain, it is not a train, etc.--[[User:Prosfilaes\|Prosfilaes]] ([[User talk:Prosfilaes\|talk]]) 13:28, 13 October 2019 (UTC) ===Suggested draft second second sentence=== {{quote\|text=<s>For a font to be described legitimately as a "Unicode font", it is only required that the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]]. There is no minimum number of characters that must be included in the font; some fonts have quite a small repertoire.</s>}} Better? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 15:05, 12 October 2019 (UTC) :That's my understanding but I'd like to see a citation to back it up. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 15:40, 12 October 2019 (UTC) :::Yes, absolutely, I agree, indeed I would have to say it is a prerequisite before it can be added but I've not yet found one. I've based it closely on the opening sentence of [[Unicode font]], but it is not cited there either. :::: I dont't think we should tell users what ''"a font to be described legitimately as a 'Unicode font'"'' is. The term "Unicode font" has been used in various senses, and it is not up to Wikipedia users to instruct other users which usage is the legitimate one, classifying any other usage as illegitimate. We should report what experts agree on, not define new standards of legitimacy. <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 22:39, 12 October 2019 (UTC) :::::That's what I was thinking but couldn't articulate. [[User:Drmccreedy\|DRMcCreedy]] ([[User talk:Drmccreedy\|talk]]) 23:32, 12 October 2019 (UTC) :::::::Yes, I accept that. I guess "validly" has the same issue. What I am trying to put encyclopedialy is that any font, no matter how small its repertoire, qualifies as a unicode font if it meets the technical specification. But the need for citation increases if anything. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:12, 13 October 2019 (UTC) ::Version by version, an increasing number of characters are assigned code points in the Unicode Standard. On the proposed definition, a non-Unicode font, even an obsolete one supporting nothing in the BMP, can become a Unicode font without any reworking of the font. Is that an acceptable consequence? [[User:Peter M. Brown\|Peter Brown]] ([[User talk:Peter M. Brown\|talk]]) 16:30, 12 October 2019 (UTC) :::I share your sentiment but I can't see any reasonable basis to exclude them unless we find a citation that says yea or nay. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 22:05, 12 October 2019 (UTC) * Maybe we should move [[Unicode font]] to [[Unicode compliant font]]. Solves about everything. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:30, 19 October 2019 (UTC) ===Possible citations=== ;Bigelow and Holmes Is this acceptable as a suitable citation?: ''To call an incomplete font containing Unicode subsets a ‘Unicode’ font could be misleading, since some users could mistakenly assume that any font called ‘Unicode’ will contain a full set of 28,000 characters.''<ref>{{cite journal \|url = http://cajun.cs.nott.ac.uk/wiley/journals/epobetan/pdf/volume6/issue3/bigelow.pdf \| title = The design of a Unicode font \| journal = ELECTRONIC PUBLISHING \| volume = VOL. 6(3), 289–305 \| date = September 1993 \| page = 298 \|last1 = Bigelow \| first1=Charles \| last2 = Holmes \| first2 = Kris}}</ref> The problem is that the paper is about designing [[Lucida Sans Unicode]] and describes the authors' attempt to create a font that has a consistent style irrespective of alphabet. Other articles I have read are quite disparaging about that idea, saying that there are major cultural differences between writing systems. These writers promote the idea that it is better to have a font that is optimised for the language in which a document is written, that it doesn't matter if it is "incomplete". I conclude also that while a 'pan-Unicode' [deprecated term!] font might have been a credible objective with 28000 characters in 1993, surely it is no longer so? So this citation might fail NPOV even if it passes [[WP:VNT]]. ; Unicode Consortium FAQ Interestingly, the Consortium prefers the phrase "Unicode conformant font". ''A Unicode-conformant font can be defined as a font which contains a mapping from Unicode characters and that maps characters to glyphs in a way that is consistent with character semantics defined in the Unicode Standard.''<ref>{{cite web \| url= https://www.unicode.org/faq/font_keyboard.html \| title = Fonts and keyboards \| publisher = Unicode Consortium \| date = 28 June 2017 \| accessdate= 13 October 2019}}</ref> Note that the FAQ says nothing about comprehensiveness. (I propose to append this citation to the opening sentence of [[Unicode fonts]]). Further comments (and citations!) welcome. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 16:34, 13 October 2019 (UTC) {{reflist talk}} ===Second proposal for second sentence=== Another attempt: {{quote\|text=A font is "Unicode compliant" if the glyphs in the font can be accessed using [[code point]]s defined in the [[Unicode Standard]].<ref>{{cite web \| url= https://www.unicode.org/faq/font_keyboard.html \| title = Fonts and keyboards \| publisher = Unicode Consortium \| date = 28 June 2017 \| accessdate= 13 October 2019}}</ref> The standard does not specify a minimum number of characters that must be included in the font; some fonts have quite a small repertoire.}} {{reflist talk}} Note that FAQ uses the term "compliant", which has the intent of my earlier "legitimate" without the overtones. Better? --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 11:24, 15 October 2019 (UTC) '''Support'''. Especially since this greatly evades any "Unicode font" suggestion. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 23:21, 18 October 2019 (UTC) '''Comment'''. The wikilinks are somewhat stange. We don't need a link to ''[[Unicode Standard]]'', as this is the article that deals with that topic. Our first link to ''[[code point]]'' currently seems to be in the ''Architecture and terminology'' section, though the term is used 12 times above that section. : The smallest Unicode compliant font of my private collection is ''[[Noto fonts\|Noto]] Sans [[Tagbanwa script\|Tagbanwa]]'' v. 2.000 which has 29 gylphs (including 5 control and zero-width characters, 2 spaces, 2 punctuation marks, and the dotted circle <code>U+25CC</code>) that can be accessed via 28 Unicode characters. That appears to be all that is needed to write the language of the homonymous ethnicity. Is it really that remarkable that some scripts require fewer glyphs and characters than a Unicode compliant font for English? In other words: Do we really need the second sentence? <small>[[Wikipedia:WikiLove\|Love]]</small> —[[:commons:User:LiliCharlie\|LiliCharlie]] <small>([[User talk:LiliCharlie\|talk]])</small> 01:58, 19 October 2019 (UTC) :::True, I did those [links] for clarity in this talk page, they would come out before going live. Good observation that it had been used earlier but not wlinked, I hadn't noticed that. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) ::I can agree removing the second sentence here. There is not requirement in this, so no need to suggest it. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 12:28, 19 October 2019 (UTC) :::If you look back at the earlier discussion, where some editors fell into the trap (led, I admit, by me but I see from the journal article I cited that it is a common error) of believing that a font cannot be a Unicode font unless it as many thousands of glyphs. I really do believe that we should say this but if the consensus is that it breaks the ''X is not a fox or a box'' rule, then I will have to concede. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) * '''Oppose'''. A font can map glyphs to Unicode code points, but map the wrong glyphs (e.g. map a "B" glyph to U+0041), in which case the font is not compliant or conformant to the Unicode Standard. [[User:BabelStone\|BabelStone]] ([[User talk:BabelStone\|talk]]) 13:20, 19 October 2019 (UTC) :: I'm sorry, I don't really understand what you are saying here, could you elaborate please? I believe that I have paraphrased the FAQ at Unicode.org: your challenge seems (to me!) to be saying that you could have a font that is compliant but not legitimate ''(sic!)''. Surely we don't have to get bogged down in the possibility that someone might design, let alone expect to sell, a font that complies with the letter of the standard but contravenes its spirit? Our purpose is to explain, not a write a criminal code. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 20:48, 19 October 2019 (UTC) ==="Unicode compliant font"=== Enough. Per OP "what is a <s>Unicode font</s>" etcetera: that does not exist. OTOH, a ''Unicode compliant font'' is well defined. So that is what enwiki should say. The article (with page content) is [[Unicode compliant font]]. -[[User:DePiep\|DePiep]] ([[User talk:DePiep\|talk]]) 21:00, 19 October 2019 (UTC) :Per [[WP:Common name]], "Unicode font" is the term most widely seen. {{as of\|20 October 2019}}, Wikipedia doesn't even have an article called "Unicode compliant font". ''This'' article should refer readers to [[Unicode font]] (or alias) for more detailed information, but it needs at least two sentences to give a reason why they should do that. Which is what all of the above is about. --[[User:Red King\|Red King]] ([[User talk:Red King\|talk]]) 13:39, 20 October 2019 (UTC)

Talk:Unicode/Archive 7: Difference between revisions