Numeric character reference: Difference between revisions

Content deleted Content added
m Undid revision 1050430693 by 84.70.16.131 (talk) Opportunity is not a sequence of characters and character sets are not "used911".
m convert special characters found by Wikipedia:Typo Team/moss (via WP:JWB)
 
(12 intermediate revisions by 12 users not shown)
Line 1:
{{Short description|Common markup construct used in SGML, XML, and HTML}}
{{one source|date=February 2021}}
A '''numeric character reference''' ('''NCR''') is a common [[markup (computer programming)|markup]] construct used in [[SGML]] and SGML-derived markup languages such as [[HTML]] and [[XML]]. It consists of a short sequence of [[character (computing)|character]]s that, in turn, represents a single character. Since [[SGML|WebSgml]], [[XML]] and [[HTML 4]], the code points of the [[Universal Character Set]] (UCS) of [[Unicode]] are used. NCRs are typically used in order to represent characters that are not [[plain text#Encoding|directly encodable]] in a particular document (for example, because they are international characters that do not fit in the 8-bit [[Character encoding|character set]] being used, or because they have special syntactic meaning in the language). When the document is interpreted by a markup-aware reader, each NCR is treated as if it were the character it represents.
 
==Examples==
In SGML, HTML, and XML, the following are all valid numeric character references for the Greek capital letter Sigma
{| class="wikitable" border="1"
|+ Numerical character reference of {{unichar|03A3|GREEK CAPITAL LETTER SIGMA}}<br/>(Note that {{hexadecimal|0931}} = 931<sub>10</sub>)
|-
! [[Unicode#Upluslink|Unicode character]]
Line 24 ⟶ 25:
 
In SGML, HTML, and XML, the following are all valid numeric character references for the Latin capital letter AE
{| class="wikitable" border="1"
|+ Numerical character reference of {{unichar|00C6|Latin capital letter AE}}
|-
Line 38 ⟶ 39:
 
In SGML, HTML, and XML, the following are all valid numeric character references for the Latin small letter sharp s ß
{| class="wikitable" border="1"
|+ Numerical character reference of {{unichar|00DF|Latin small letter sharp s}}
|-
Line 176 ⟶ 177:
| U+005A || &amp;#90; || &amp;#x5A; || Z
|-
| U+005B || &amp;#91; || &amp;#x5B; || &#91;[
|-
| U+005C || &amp;#92; || &amp;#x5C; || \
|-
| U+005D || &amp;#93; || &amp;#x5D; || &#93;]
|-
| U+005E || &amp;#94; || &amp;#x5E; || ^
Line 240 ⟶ 241:
| U+007A || &amp;#122; || &amp;#x7A; || z
|-
| U+007B || &amp;#123; || &amp;#x7B; || &#123;{
|-
| U+007C || &amp;#124; || &amp;#x7C; || -{{pipe}}
|-
| U+007D || &amp;#125; || &amp;#x7D; || &#125;}
|-
| U+007E || &amp;#126; || &amp;#x7E; || ~
Line 284 ⟶ 285:
==Compatibility issues==
 
In the initial versions of [[SGML]] and [[HTML]], numeric character references were interpreted in relationship to the document character encoding, rather than [[Unicode]]. For Latin-script documents, numeric character references to characters between x80 and x9F in those documents will not be correct against [[Unicode]], and must be recoded. HTML standards prior to [[HTML 4]] only supported only Western Latin script documents: the treatment of character references above #7F may vary between applications and national conventions.
 
For example, as mentioned above, the correct numeric character reference for the [[Euro sign]] "€" <code>U+20AC</code> when using [[Unicode]] is decimal <code>&amp;#8364;</code> and hexadecimal <code>&amp;#x20AC;</code>. However, if using tools supporting obsolete implementations of HTML, the reference <code>&amp;#128;</code> (Euro sign in the [[Cp1252CP-1252]] code page) or <code>&amp;#164;</code> (Euro sign in [[ISO/IEC 8859-15]] ) may work.
 
As another example, if some text was created originally using the [[MacRoman]] character set, the [[quotation mark glyphs|left double quotation mark]] {{char|"}} will be represented with code point xD2. This will not display properly in a system expecting a document encoded as UTF-8, ISO 8859-1, or [[CP1252]]CP-1252, where this code point is occupied by the letter [[Ò]]. The correct numeric character reference for {{char|"}} in HTML 4 and newer is <code>&amp;#x201C;</code>, because [[Unicode#Upluslink|U+]]201C is its UCS code. In some systems, the [[List of XML and HTML character entity references|named character reference]] <code>&amp;ldquo;</code> may also be available.
 
==See also==
Line 295 ⟶ 296:
==References==
{{Reflist}}
 
 
{{Unicode navigation}}