Content deleted Content added
m →Discussion: minor edits |
→Restrictions: more examples, clarifications |
||
Line 39:
[[ISO 10646]] (the Universal Character Set) is the "document character set" of SGML, HTML 4, so by default, any character in such a document, and any character ''referenced'' in such a document, must be in the UCS.
While the syntax of SGML does not prohibit unassigned code points such as  from being referenced, SGML-derived markup languages such as HTML and XML can, and often do, restrict numeric character references to reference only those code points that have not been assigned to characters
Restrictions may also apply for other reasons. For example, in HTML 4, , which is a reference to a non-printing "form feed" control character, is allowed (because a form feed character is allowed), but in XML, the form feed character cannot be used, not even by reference. As another example, €, which is a reference to another control character, cannot be used or referenced in HTML, but it is allowed in HTML. Furthermore, XML 1.0, being based on an older version of ISO 10646, prohibited using characters above U+FFFD, thus making a reference like 𐀁 illegal, while in XML 1.1, such a reference is allowed, because the available character repertoire was explicitly extended.
|