Content deleted Content added
→Illegal characters: don't pipe Windows-1252 to somewhere else. Also, Windows has been based on Unicode since Windows NT/2000 |
|||
Line 20:
* 55296 to 57343 (xD800–xDFFF, the [[UTF-16]] surrogate halves)
These characters are ''not even allowed by reference''. That is, you should not even write them as [[numeric character reference]]s. However, references to characters 128–159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to ''bytes'' 128–159 (decimal) in the [[
The characters 9 (tab), 10 (linefeed), and 13 (carriage return) are allowed in HTML documents, but, along with 32 (space) are all considered "[[whitespace (computer science)|whitespace]]"<ref>http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1</ref>. The "form feed" control character, which would be at 12, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters — perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <code><pre></code> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.
|