HTML decimal character rendering: Difference between revisions

Content deleted Content added
Added description of HTML code and removed leading zeros from numbers
Line 1:
A ''numeric character reference'' in HTML refers to a character by its [[Universal Character Set]]/[[Unicode]] ''code point'', and uses the format
Not all [[web browser]]s or [[email client]]s used by receivers of HTML documents, or [[text editor]]s used by authors of HTML documents, will be able to '''render all HTML characters'''. Most modern [[web browsers]] are able to display many more characters than the latest versions of [[Microsoft]] [[Internet Explorer]]. This is due to different "font linking" capabilities that allow [[glyph]]s to be used from fonts according to what characters are needed and supported by the fonts on the system.
 
:<code>&#</code>''nnnn''<code>;</code>
or
:<code>&#x</code>''hhhh''<code>;</code>
 
where ''nnnn'' is the code point in [[decimal]] form, and ''hhhh'' is the code point in [[hexadecimal]] form. The ''x'' must be lowercase in XML documents. The ''nnnn'' or ''hhhh'' may be any number of digits and may include leading zeros. The ''hhhh'' may mix uppercase and lowercase, though uppercase is the usual style.
 
Not all [[web browser]]s or [[email client]]s used by receivers of HTML documents, or [[text editor]]s used by authors of HTML documents, will be able to '''render all HTML characters'''. Most modern [[websoftware browsers]] areis able to display manymost moreor charactersall thanof the latestcharacters versionsfor ofthe [[Microsoft]]user's [[Internetlanguage, Explorer]].and Thiswill isdraw duea tobox differentor "fontother linking"clear capabilitiesindicator that allow [[glyph]]s to be used from fonts according to whatfor characters arethey neededcannot and supported by the fonts on the systemrender.
 
For codes from 0 to 127, the original 7-bit [[ASCII]] standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using [[List of XML and HTML character entity references|character entity names]]. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference.
Line 7 ⟶ 15:
HTML forbids&nbsp;[http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html] the use of the characters with [[Universal Character Set]]/[[Unicode]] code points
 
* 0 to 31, except 9, 10, and 13 (C0 [[control characters]])
* 0000 to 0008
* 127 (DEL character)
* 0011 to 0012
* 128 to 159 (C1 [[control characters]])
* 0014 to 0031
* 55296 to 57343 (xD800-xDFFF, the [[UTF-16]] surrogate halves)
* 0127
* 0128 to 0159
* 55296 to 57343
 
These characters are ''not even allowed by reference''. That is, you are not even allowed to write them as [[numeric character reference]]s. However, references to characters 0128128&ndash;0159159 are commonly interpreted by lenient web browsers as if they were references to the characters assigned to ''bytes'' 128&ndash;159 (decimal) in the [[ISO 8859-1|Windows-1252]] character encoding. This is in violation of HTML and SGML standards, and the characters are already assigned to higher code points, so HTML document authors should always use the higher code points. For example, for the trademark sign (™), use <code>&amp;#8482;</code>, not <code>&amp;#153;</code>.
 
The characters 00099 (tab), 001010 (linefeed), and 001313 (carriage return) are allowed in HTML documents, but, along with 003232 (space) are all considered "[[whitespace (computer science)|whitespace]]"[http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1]. The "form feed" control character, which would be at 001212, is not allowed in HTML documents, but is also mentioned as being one of the "white space" characters &mdash; perhaps an oversight in the specifications. In HTML, most consecutive occurrences of white space characters, except in a <code>&lt;pre&gt;</code> block, are interpreted as comprising a single "word separator" for rendering purposes. A word separator is typically rendered a single en-width space in European languages, but not in others.
 
{{DEFAULTSORT:Html Decimal Character Rendering}}