Character encodings in HTML: Difference between revisions

Content deleted Content added
Robbot (talk | contribs)
m Andre Engels - robot Modifying:zh
corrected end of third paragraph; should be e.g. � not &31;
Line 27:
Many symbolic character entities have been defined. For example, the character '&lambda;' can be encoded as <code>&amp;lambda;</code>. This use of the '&' character as an [[escape character]] for character entities means that literal '&' characters in HTML need to be encoded as an entity themselves, as <code>&amp;amp;</code>. A similar escapes is required for the '<' character, encoded as <code>&amp;lt;</code>. The '>' character only needs to be encoded if it is part of an attribute value: it should then be encoded as <code>&amp;gt;</code>. Note that this encoding is different from URL encoding, which uses a different method and is far more strict.
 
Decimal and hexadecimal HTML character references can also be used, based on the [[Unicode]] numeric code for the character encoded. For example, &lambda; can also be represented as a decimal-coded character reference as <code>&amp;#955;</code>. It is important to note that numeric references ''always'' refer to Unicode, irrespective of page encoding. Using numeric references which lie within the reserved control area of Unicode (and therefore also ISO 8859-1) is therefore illegal. That is, all characters in the ([[hexadecimal|hex]]) ranges 00&#8211;1F, 7F, and 80&#8211;9F, or &amp;#0; to &amp;#31; and &amp;#127; to &amp;#159;.
 
Note that unnecessary use of HTML character references may significantly reduce the readability of HTML. If the character encoding for a web page is chosen appropriately then HTML character references are usually only required for a few special characters. The characters '''&amp;''' and '''&lt;''' always need to be encoded, as noted above.