HTML decimal character rendering: Difference between revisions

Content deleted Content added
link to main page
m Illegal characters: needed some additional wikilinking after being split from List of HTML decimal character references
Line 5:
===Illegal characters===
 
HTML forbids the use of the characters with [[Universal Character Set]]/[[Unicode]] code points
 
* 0000–0008
* 0011
Line 11 ⟶ 12:
* 0128–0159
 
These characters are ''not even allowed by reference''. That is, you are not even allowed to write them as [[numeric character referencesreference]]s. However, references to characters 0128&ndash;0159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to ''bytes'' 128&ndash;159 (decimal) in the [[ISO 8859-1|Windows-1252]] character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (&#8482;), use <code>&amp;#8482;</code>, not <code>&amp;#153;</code>.
 
The characters 0009 (tab), 0010 (linefeed), 0012 (form feed), and 0013 (carriage return) are allowed in HTML documents, but, along with 0032 (space) are all considered "white space", and, except in a <code>&lt;pre&gt;</code> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.