Content deleted Content added
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.2) (Whoop whoop pull up - 10895 |
|||
(9 intermediate revisions by 8 users not shown) | |||
Line 4:
{{essay-like|date=December 2011}}
{{refimprove|date=January 2011}}
}}
{{SpecialChars}}
{{Html series}}
Web pages authored using
In RFC 1866, the initial HTML 2.0 standard, the document character set was defined as ISO-8859-1 (later HTML standard defaults to [[Windows-1252]] encoding). It was extended to [[ISO 10646]] (which is basically equivalent to Unicode) by {{IETF RFC|2070}}. It does not vary between documents of different languages or created on different platforms. The external character encoding is chosen by the author of the document (or the software the author uses to create the document) and determines how the bytes used to store and/or transmit the document map to characters from the document character set. Characters not present in the chosen external character encoding may be represented by character entity references.
Line 60 ⟶ 59:
Many HTML documents are served with inaccurate encoding information, or no encoding information at all. In order to determine the encoding in such cases, many browsers allow the user to manually select an encoding name from a list. They may also employ an encoding auto-detection algorithm that works in concert '''with''' or{{snd}} ''in the case of the BOM and in case of HTML served as XML''{{snd}} '''against''' the manual override.
For HTML documents which are <code>text/html</code> serialized, manual override may apply to all documents, or only those for which the encoding cannot be ascertained by looking at declarations and/or byte patterns. The fact that the manual override is present and widely used hinders the adoption of accurate encoding declarations on the Web; therefore the problem is likely to persist. But note that Internet Explorer, Chrome and Safari{{snd}} for both XML and <code>text/html</code> serializations{{snd}} do not permit the encoding to be overridden whenever the page includes the BOM.<ref>
For HTML documents serialized with the preferred XML label{{snd}} <code>application/xhtml+xml</code>, manual encoding override is not permitted. To override the encoding of such an XML document would mean that the document stopped being XML, as it is a fatal error for XML documents to have an encoding declaration with detectable errors. Currently, Gecko browsers such as Firefox, abide to this rule, whereas the bulk of the other common browsers that support HTML as XML, such as Webkit browsers (Chrome/Safari) <ref>
==Web browser support==
Line 170 ⟶ 169:
==Frequency of usage==
According to internal data from [[Google]]'s web index, in December 2007 the [[UTF-8]] Unicode encoding became the most frequently used encoding on web pages, overtaking both [[ASCII]] (US) and [[ISO/IEC 8859-1|8859-1]]/[[Windows-1252|1252]] (Western European).<ref>
==See also==
Line 198 ⟶ 197:
[[Category:HTML]]
[[Category:Unicode|HTML]]
|