Unicode and HTML: Difference between revisions

Content deleted Content added
m Encoding overriding: Typo fixing, replaced: that that the → that the using AWB
Line 62:
For HTML documents which are <code>text/html</code> serialized, manual override may apply to all documents, or only those for which the encoding cannot be ascertained by looking at declarations and/or byte patterns. The fact that the manual override is present and widely used hinders the adoption of accurate encoding declarations on the Web; therefore the problem is likely to persist. But note that Internet Explorer, Chrome and Safari — for both XML and <code>text/html</code> serializations — do not permit the encoding to be overridden whenever the page includes the BOM.<ref>[http://www.w3.org/Bugs/Public/show_bug.cgi?id=12897 Bug 12897 - In some parsers, UTF-8 BOM trumps the HTTP charset attribute (Encoding sniffing algorithm)]</ref>
 
For HTML documents serialized with the preferred XML label — <code>application/xhtml+xml</code>, manual encoding override is not permitted. To override the encoding of such an XML document would mean that that the document stopped being XML, as it is a fatal error for XML documents to have an encoding declaration with detectable errors. Currently, Gecko browsers such as Firefox, abide to this rule, whereas the bulk of the other common browsers that support HTML as XML, such as Webkit browsers (Chrome/Safari) <ref>[https://bugs.webkit.org/show_bug.cgi?id=66189 Bug 66189 - XML parser doesn't emit FATAL ERROR for all, detectable encoding errors]</ref> do allow the encoding of XHTML documents to be manually overridden.
 
==Web browser support==