Content deleted Content added
m disambiguate markup article link |
m →Discussion: minor edits |
||
Line 20:
Documents are rarely, in practice, ever allowed to use more than one encoding internally, so the onus is usually on the markup language to provide a means for document authors to express unencodable characters in terms of encodable ones. This is generally done through some kind of "escaping" mechanism.
The SGML-based markup languages allow document authors to use special sequences of characters from the ASCII range (the first 127 code points of Unicode) to represent, or ''reference'', any Unicode
Character references that are based on the referenced character's ISO 10646 or Unicode "code point" are called ''numeric'' character references. In HTML 4 and in all versions of [[XHTML]] and XML, the code point can be expressed either as a decimal (base 10) number or as a hexadecimal (base 16) number. The syntax is as follows:
Character U+0026 (ampersand), followed by character U+0023 (number sign), followed by one of the following choices:
Line 29:
* character U+0078 ("x") followed by one or more hexadecimal digits, which are zero (U+0030) through nine (U+0039), Latin capital letter A (U+0041) through F (U+0046), and Latin small letter a (U+0061) through f (U+0066);
all followed by character U+003A (semicolon). Older versions of HTML disallowed the hexadecimal syntax.
The characters that comprise a numeric character reference can be represented in every character encoding used in computing and telecommunications today, so there is no risk of the reference itself being unencodable.
|