Unicode and HTML: Difference between revisions

Content deleted Content added
m Encoding overriding: Typo fixing, replaced: that that the → that the using AWB
removing 1 hyphen: —> "newly created"—WP:HYPHEN, sub-subsection 3, point 4
Line 52:
 
===Encoding trends===
Because of the legacy of 8-bit text representations in [[programming language]]s and [[operating system]]s and the desire to avoid burdening users with the need to understand the nuances of encoding, many text editors used by HTML authors are unable or unwilling to offer a choice of encodings when saving files to disk and often do not even allow input of characters beyond a very limited range. Consequently, many HTML authors are unaware of encoding issues and may not have any idea what encoding their documents actually use. Misunderstandings, such as the belief that the encoding declaration affects a change in the actual encoding (whereas it is actually just a label that could be inaccurate), is also a reason for this editor attitude. Another factor contributing in the same direction, is the arrival of UTF-8 — which greatly diminishes the need for other encodings, and thus modern editors tends to default, as recommended by the HTML5 specification,<ref>{{Cite web|url=http://www.w3.org/TR/html5/semantics.html#charset|title=HTML5|author=Ian Hickson|accessdate=17 September 2011|year=2011|quote=Authors are encouraged to use UTF-8. Conformance checkers may advise authors against using legacy encodings. [RFC3629] Authoring tools should default to using UTF-8 for newly- created documents. [RFC3629]}}</ref> to UTF-8.
 
===Byte order mark/Unicode sniffing===