Numeric character reference: Difference between revisions

Content deleted Content added
No edit summary
Tags: Visual edit Mobile edit Mobile web edit
Reverted good faith edits by 151.182.246.43 (talk): Unexplained deletion. (TW)
Line 23:
|}
 
In SGML, HTML, and XML, the following are all valid numeric character references for the [[Latin spelling and pronunciation|Latin]] capital letter AE
{| class="wikitable sortable" border="1"
|+ Numerical character reference of {{unichar|00C6|Latin capital letter Æ}}
|-
! [[Unicode#Upluslink|Unicode character]]
! Numerical base
! rowspan="3" | Numerical reference in markup
! Effect
|-
| U+00C6 || Decimal || Æ || Æ
|-
| U+00C6 || Hexadecimal || Æ || Æ
|}
 
Line 78:
The Universal Character Set defined by ISO 10646 is the "document character set" of SGML, HTML 4, so by default, any character in such a document, and any character ''referenced'' in such a document, must be in the UCS.
 
While the syntax of SGML does not prohibit references to invalid or unassigned code points, such as <code>&amp;#xFFFF;</code>, SGML-derived markup languages such as HTML and XML can, and often do, restrict numeric character references to only those code points that are assigned to characters.
 
<code>;</code>, SGML-derived markup languages such as HTML and XML can, and often do, restrict numeric character references to only those code points that are assigned to characters.
 
Restrictions may also apply for other reasons. For example, in HTML 4, <code>&amp;#12;</code>, which is a reference to a non-printing "form feed" control character, is allowed because a form feed character is allowed. But in XML, the form feed character cannot be used, not even by reference.{{Citation needed|date=May 2013}} As another example, <code>&amp;#128;</code>, which is a reference to another control character, is not allowed to be used or referenced in either HTML or XML, but when used in HTML, it is usually not flagged as an error by web browsers – some of which interpret it as a reference to the character represented by code value 128 in the [[Windows-1252]] encoding for compatibility reasons. This character, "€", has to be represented as <code>&amp;#8364;</code> in a standard-compliant HTML code. As a further example, prior to the publication of XML 1.0 Second Edition on October 6, 2000, XML 1.0 was based on an older version of ISO 10646 and prohibited using characters above U+FFFD, except in character data, thus making a reference like <code>&amp;#65536;</code> (U+10000) illegal. In XML 1.1 and newer editions of XML 1.0, such a reference is allowed, because the available character repertoire was explicitly extended.
Line 95 ⟶ 93:
 
==See also==
* [[List of XML and HTML character entity references|List of XML and HTML character]]
* [[List of XML and HTML character entity references|entity references]]
 
==[[References to Bad Wolf|References]]==
{{Reflist||group=}}