Character encodings in HTML: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 13:26, 12 August 2024 edit 197.98.201.67 (talk) Undid revision 1239926119 by 197.98.201.67 (talk) Tags: Undo Reverted ← Previous edit		Latest revision as of 05:06, 16 November 2024 edit undo Ejazz128 (talk \| contribs) 24 edits m →External links: i have removed the broken link
(3 intermediate revisions by 3 users not shown)
Line 118: ==Character references== {{Main\|~~Character~~List of XML and HTML character entity ~~reference~~references\|Numeric character reference}} In addition to native character encodings, characters can also be encoded as ''character references'', which can be ''numeric character references'' ([[decimal]] or [[hexadecimal]]) or ''character entity references''. Character entity references are also sometimes referred to as ''named entities'', or ''HTML entities'' for HTML. HTML's usage of character references derives from [[SGML]]. Line 124: ===HTML character references=== <!--Linked from [[Template:Auxiliary template common notice]]--> A ''[[numeric character reference]]'' in HTML refers to a character by its [[Universal Character Set]]/[[Unicode]] ''[[code point]]'', and uses the format :<code>&#''nnnn'';</code> Line 136: For codes from 0 to 127, the original 7-bit [[ASCII]] standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using [[List of XML and HTML character entity references\|character entity names]]. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference. [[List of XML and HTML character entity references\|Character entity references]] can also have the format <code>&''name'';</code> where ''name'' is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as <code>&lambda;</code> in an HTML document. The character entity references <code>&lt;</code>, <code>&gt;</code>, <code>&quot;</code> and <code>&amp;</code> are predefined in HTML and SGML, because <code><</code>, <code>></code>, <code>"</code> and <code>&</code> are already used to delimit markup. This notably did not include XML's <code>&apos;</code> (') entity prior to [[HTML5]]. For a list of all named HTML character entity references along with the versions in which they were introduced, see [[List of XML and HTML character entity references]]. Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or none at all if a native [[Unicode]] encoding like [[UTF-8]] is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as [[cross-site scripting]]. If HTML attributes are left unquoted, certain characters, most importantly [[whitespace character\|whitespace]], such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters. Line 167: == External links == * [https://owuk.com/html-encode.html Online HTML entity encoder & decoder tool] * [http://www.w3.org/TR/REC-html40/sgml/entities.html Character entity references in HTML4] * [http://www.sitepoint.com/article/guide-web-character-encoding/ The Definitive Guide to Web Character Encoding]