Unicode and HTML: Difference between revisions

Content deleted Content added
Encoding information: there is no need for the slash, in any HTML version
Numeric character references: It was a chinese character but did not really look like one
Line 23:
In order to work around the limitations of legacy encodings, HTML is designed such that it is possible to represent characters from the whole of Unicode inside an HTML document by using a [[numeric character reference]]: a sequence of characters that explicitly spell out the Unicode code point of the character being represented. A character reference takes the form '''<code>&amp;#</code>'''<var>N</var>'''<code>;</code>''', where <var>N</var> is either a [[decimal]] number for the Unicode code point, or a [[hexadecimal]] number, in which case it must be prefixed by <code>x</code>. The characters that compose the numeric character reference are universally representable in every encoding approved for use on the Internet.
 
For example, a Unicode code point like U+53F65408, which corresponds to a particular Chinese character, has to be converted to a decimal number, preceded by <code>&amp;#</code> and followed by <code>;</code>, like this: <code>&amp;#2149421512;</code>, which produces this: &#21512; (if it doesn't look like a Chinese character, see [[Template:Special characters]]).
 
The support for hexadecimal in this context is more recent, so older browsers might have problems displaying characters referenced with hexadecimal numbers—but they will probably have a problem displaying Unicode characters above code point 255 anyway. To ensure better compatibility with older browsers, it is still a common practice to convert the hexadecimal code point into a decimal value (for example <code>&amp;#2149421512;</code> instead of <code>&amp;#x53F6x5408;</code>).
 
=== Named character entities ===