HTML decimal character rendering: Difference between revisions

Content deleted Content added
Added description of HTML code and removed leading zeros from numbers
Illegal characters: rephrase "not allowed to".
Line 20:
* 55296 to 57343 (xD800-xDFFF, the [[UTF-16]] surrogate halves)
 
These characters are ''not even allowed by reference''. That is, you areshould not even allowed to write them as [[numeric character reference]]s. However, references to characters 128&ndash;159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to ''bytes'' 128&ndash;159 (decimal) in the [[ISO 8859-1|Windows-1252]] character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use <code>&amp;#8482;</code>, not <code>&amp;#153;</code>.
 
The characters 9 (tab), 10 (linefeed), and 13 (carriage return) are allowed in HTML documents, but, along with 32 (space) are all considered "[[whitespace (computer science)|whitespace]]"[http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1]. The "form feed" control character, which would be at 12, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters &mdash; perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <code>&lt;pre&gt;</code> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.