Numeric character reference: Difference between revisions

Content deleted Content added
m Discussion: minor edits
Restrictions: more examples, clarifications
Line 39:
[[ISO 10646]] (the Universal Character Set) is the "document character set" of SGML, HTML 4, so by default, any character in such a document, and any character ''referenced'' in such a document, must be in the UCS.
 
While the syntax of SGML does not prohibit unassigned code points such as  from being referenced, SGML-derived markup languages such as HTML and XML can, and often do, restrict numeric character references to reference only those code points that have not been assigned to characters. Restrictions may also apply for other reasons(rather, e.g.code in HTML 4, , which is a reference to a non-printing "Form Feed" control character, is allowed, but in XML, it ispoints not, while in XML, another control character, €, is allowed, but is not allowed inpermanently HTMLunassigned).
 
Restrictions may also apply for other reasons. For example, in HTML 4, , which is a reference to a non-printing "form feed" control character, is allowed (because a form feed character is allowed), but in XML, the form feed character cannot be used, not even by reference. As another example, €, which is a reference to another control character, cannot be used or referenced in HTML, but it is allowed in HTML. Furthermore, XML 1.0, being based on an older version of ISO 10646, prohibited using characters above U+FFFD, thus making a reference like 𐀁 illegal, while in XML 1.1, such a reference is allowed, because the available character repertoire was explicitly extended.