Talk:Unicode and HTML: Difference between revisions

Content deleted Content added
SineBot (talk | contribs)
Implementing WP:PIQA (Task 26)
 
(8 intermediate revisions by 7 users not shown)
Line 1:
{{WikiProject banner shell|class=Start|
{{WikiProject Computing|importance=Low}}
{{WikiProject Internet|importance=Low}}
}}
 
== Character chart links ==
For the Unicode character charts, reverted the URL from http://www.unicode.org/charts/normalization/ back to http://www.unicode.org/charts/. The normalization charts only display the characters if you have the font already installed and do not seem to be as complete as the full charts available on the other URL. --[[User:Nate Silva|Nate]] 15:37 Mar 7, 2003 (UTC)
Line 155 ⟶ 160:
 
I was going to make it all a lot more readable by simply removing the whole first paragraph and useless to anyone except someone wanting a laugh. <span style="font-size: smaller;" class="autosigned">—Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/79.79.206.183|79.79.206.183]] ([[User talk:79.79.206.183|talk]]) 11:19, 23 October 2008 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
:Unicode defines a (large) set of numbers known as "code points" and what characters they represent (note however than in some cases unicode code points do not have a 1:1 mapping to user visible characters due to the presence of combining and control characters).
:When you write a HTML document you are supposed to* specify what "charset" you are using. This charset defines how the sequence of bytes in your HTML file are interpreted.
:The key to understanding the relationship between unicode and HTML is to understand that HTML regards all charsets as encodings of a subset of unicode. If you write your html documents in [[WINDOWS-1252]] then you can only directly represent characters that are in WINDOWS-1252 but you can still indirectly represent any unicode character through an entity reference. Alternatively you can write your HTML document in UTF-8 and represent almost all** characters directly.
: * if you don't specify what charset you are using the browser will likely make a default assumption which may or may not match the charset you actually used.
: ** A handful characters (which varies slightly by context) can't be represented directly because they are "markup sensitive"
: -- [[User:Plugwash|Plugwash]] ([[User talk:Plugwash|talk]]) 01:35, 17 February 2012 (UTC)
 
== unicode (UTF-8): about 50% of the web in 2010 ==
 
It looks like unicode (UTF-8) is about 50% of the web in 2010 (Source: http://3.bp.blogspot.com/_7ZYqYi4xigk/S2Hcx0fITQI/AAAAAAAAFmM/ifZX2Wmv40A/s1600-h/unicode.png and http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html ) <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/84.99.17.74|84.99.17.74]] ([[User talk:84.99.17.74|talk]]) 20:30, 13 June 2011 (UTC)</span><!-- Template:UnsignedIP --> <!--Autosigned by SineBot-->
 
 
::60% and 80% with ASCII in 2012 <nowiki> http://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.htmlhttp://googleblog.blogspot.com/2012/02/unicode-over-60-percent-of-web.html </nowiki> <span style="font-size: smaller;" class="autosigned">— Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/86.69.108.41|86.69.108.41]] ([[User talk:86.69.108.41|talk]]) 23:53, 16 February 2012 (UTC)</span><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
 
The section ''Frequency of usage'' says "the UTF-8 Unicode encoding became the most frequently used encoding on web pages, overtaking both ASCII (US) and 8859-1/1252", which is a quote from the cited Google page. But it doesn't make any sense, given that ASCII is a subset of UTF-8. I'm guessing maybe the Google author was referring to the stated charset of the pages, does anybody know? (Old now, I know...)[[User:Mcswell|Mcswell]] ([[User talk:Mcswell|talk]]) 01:13, 23 April 2021 (UTC)