Talk:Unicode and HTML: Difference between revisions

Content deleted Content added
Same characters?
Implementing WP:PIQA (Task 26)
 
(26 intermediate revisions by 16 users not shown)
Line 1:
{{WikiProject banner shell|class=Start|
{{WikiProject Computing|importance=Low}}
{{WikiProject Internet|importance=Low}}
}}
 
== Character chart links ==
For the Unicode character charts, reverted the URL from http://www.unicode.org/charts/normalization/ back to http://www.unicode.org/charts/. The normalization charts only display the characters if you have the font already installed and do not seem to be as complete as the full charts available on the other URL. --[[User:Nate Silva|Nate]] 15:37 Mar 7, 2003 (UTC)
Line 96 ⟶ 101:
::::::My guess is IE does use other fonts from those specified when rendering glyphs but ONLY by mapping specific code points to specific fonts not by searching for a font that can render the charaters it wan'ts. There may well be a configuration controlling this but if there is i don't know where. [[User:Plugwash|Plugwash]] 19:54, 14 July 2005 (UTC)
:::::::I did some more digging and I think what's going on is that it actually does look at other fonts. The problem is that is that those other fonts must be explicitly associated with the current base font in the registry (look for a key containing 'FontLink', like HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\FontLink\SystemLink). It may be a matter of careful use of language tags in the HTML to make sure a good base font is chosen, and then having the right mappings in the registry to associate that font with the fonts to fall back on. But there are still some nuances to base font selection that I don't understand. Anyone with more experience in this area, please comment! — [[User:Mjb|mjb]] 20:56, 14 July 2005 (UTC)
 
This sounds promising: per [http://blogs.msdn.com/ie/ IEBlog], Microsoft's Advanced Technology Center in Beijing [http://blogs.msdn.com/ie/archive/2005/09/22/473159.aspx is working on improvements to font linking and fallback] for IE7. — [[User:Mjb|mjb]] 08:22, 26 September 2005 (UTC)
 
== Editing Forms and Encoding ==
Line 111 ⟶ 118:
 
Is the character encoded by an HTML number the same character that is encoded by Unicode by the same number? For example, is character number 2343 in HTML the same as 2343 in Unicode? --[[User:Abdull|Abdull]] 14:31, 19 August 2005 (UTC)
 
:Yes, it is, by definition (HTML does not define the meaning for the character numbers; it instead defers to unicode). --[[User:CesarB|cesarb]] 15:36, 19 August 2005 (UTC)
 
::yes the numbers do reffer to unicode code points. However most html entities are decimal (you can do hexadecmial ones but they aren't seen much) whilst unicode gennerally use hexadecimal when reffering to code points. [[User:Plugwash|Plugwash]] 16:30, 19 August 2005 (UTC)
 
:::Is there any reference to know since which version each browser supports hexadecimal entities, and which are the browsers that still don't support it? Because hexadecimal is so natural when so many charmap viewers give only hexadecimal unicode code… [[:fr:Utilisateur:Lacrymocéphale|Lacrymocéphale]] <small>—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/217.195.19.145|217.195.19.145]] ([[User talk:217.195.19.145|talk]]) 13:02, 3 June 2008 (UTC)</small><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
 
== I don't want to use that font! ==
Line 116 ⟶ 129:
'''Some web browsers, such as Mozilla Firefox, Opera, and Safari, are able to display multilingual web pages by intelligently choosing a font to display each individual character on the page. They will correctly display any mix of Unicode blocks, as long as appropriate fonts are present in the operating system.'''
 
[[Code2000]] is a great font with many characters, but some characters are pretty badly resembled with Code2000, for example the [[International Phonetic Alphabet|IPA]] characters. Since installing Code2000 on my Windows XP, Mozilla Firefox always uses Code2000 for every special character there is to be displayed. How do I tell Firefox to use another font for IPA? Actually, how does Firefox decide what font to use for special characters if there are four different to choose? --[[User:Abdull|Abdull]] 14:31, 19 August 2005 (UTC)
:try posting on a mozilla forum your likely to get better support there. [[User:Plugwash|Plugwash]] 21:19, 19 August 2005 (UTC)
 
== Combiners ==
 
Is it possible to express combiners in escaped html? Eg 0041+0308 --[[User:207.109.251.117|207.109.251.117]] 03:43, 4 November 2005 (UTC)
:of course, why wouldn't it be? [[User:Plugwash|Plugwash]] 00:25, 16 November 2005 (UTC)
 
== Internet Explorer ==
 
On the page it says:
<nowiki>"Internet Explorer is capable of displaying the full range of Unicode characters, but can't automatically make the necessary font choice. Web page authors must guess which appropriate fonts might be present on users' systems, and manually specify them for each block of text with a different language or Unicode range. A user may have another font installed which would display some characters, but if the web page author hasn't specified it, then Explorer will fail to display them, and show placeholder squares instead."</nowiki>
What is the proper font choice and how would you change it? (I have Internet Explorer and access lots of the math pages on Wikipedia and see lots of <nowiki>"placeholder squares"</nowiki>.)--[[User:SurrealWarrior|SurrealWarrior]] 01:33, 4 December 2005 (UTC)
 
== Browser support - entities or no entities? ==
 
Is there any difference in browser support for, e.g.,
 
# Character X represented as a named/numeric entity (<code>mdash, #8211</code>)<br>versus
# Character X as an actual utf-8 character, in a utf-8 encoded HTML document, properly served?
 
::Not with any modern browser afaict but i think NS4 may have some strange behaviours regarding this. Most of the remaining problems with browser unicode support have to do with font selection and rendering complex text. [[User:Plugwash|Plugwash]] 17:44, 12 February 2006 (UTC)
 
==What about IE 7 ?==
I've been having a look at this page because I wondered whether the new IE 7 still has that annoying bug of not being able to choose an appropriate font. However, the page only mentions IE 6. Can anybody verify how it is on IE 7? Thanks.
 
 
==Worst written Document I've ever come across==
This is by far the worst document I've ever come across. I won't say it's greek because I can read greek, this is just complete utter rubbish. To say an HTML page is unicode is a bit like saying a cat is a dog. I create html pages using notepad and I know that my html pages can only have a very limited set of characters. At what point does the 8bit coding of my document turn into the ??bit coding needed for unicode? Or does unicode mean "any character defined by a number" as seems to be the definition used in the opening paragraph.
 
I was going to make it all a lot more readable by simply removing the whole first paragraph and useless to anyone except someone wanting a laugh. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/79.79.206.183|79.79.206.183]] ([[User talk:79.79.206.183|talk]]) 11:19, 23 October 2008 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
:Unicode defines a (large) set of numbers known as "code points" and what characters they represent (note however than in some cases unicode code points do not have a 1:1 mapping to user visible characters due to the presence of combining and control characters).
:When you write a HTML document you are supposed to* specify what "charset" you are using. This charset defines how the sequence of bytes in your HTML file are interpreted.
:The key to understanding the relationship between unicode and HTML is to understand that HTML regards all charsets as encodings of a subset of unicode. If you write your html documents in [[WINDOWS-1252]] then you can only directly represent characters that are in WINDOWS-1252 but you can still indirectly represent any unicode character through an entity reference. Alternatively you can write your HTML document in UTF-8 and represent almost all** characters directly.
: * if you don't specify what charset you are using the browser will likely make a default assumption which may or may not match the charset you actually used.
: ** A handful characters (which varies slightly by context) can't be represented directly because they are "markup sensitive"
: -- [[User:Plugwash|Plugwash]] ([[User talk:Plugwash|talk]]) 01:35, 17 February 2012 (UTC)
 
== unicode (UTF-8): about 50% of the web in 2010 ==
 
It looks like unicode (UTF-8) is about 50% of the web in 2010 (Source: http://3.bp.blogspot.com/_7ZYqYi4xigk/S2Hcx0fITQI/AAAAAAAAFmM/ifZX2Wmv40A/s1600-h/unicode.png and http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html ) <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/84.99.17.74|84.99.17.74]] ([[User talk:84.99.17.74|talk]]) 20:30, 13 June 2011 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
 
 
::60% and 80% with ASCII in 2012 <nowiki> http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.htmlhttp://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html </nowiki> <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/86.69.108.41|86.69.108.41]] ([[User talk:86.69.108.41|talk]]) 23:53, 16 February 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
 
The section ''Frequency of usage'' says "the UTF-8 Unicode encoding became the most frequently used encoding on web pages, overtaking both ASCII (US) and 8859-1/1252", which is a quote from the cited Google page. But it doesn't make any sense, given that ASCII is a subset of UTF-8. I'm guessing maybe the Google author was referring to the stated charset of the pages, does anybody know? (Old now, I know...)[[User:Mcswell|Mcswell]] ([[User talk:Mcswell|talk]]) 01:13, 23 April 2021 (UTC)