Character encodings in HTML: Difference between revisions

Content deleted Content added
none
Tags: Reverted Visual edit
Ejazz128 (talk | contribs)
m External links: i have removed the broken link
 
(4 intermediate revisions by 3 users not shown)
Line 4:
{{Use dmy dates|date=December 2021}}
{{Html series}}
While Hypertext Markup Language ([[HTML]]) has been in use since 1991, HTML 4.0 from December 1997 was the first stanh shlkjdf lajsdljf lajsd'kf asjdf ;alk asdfdardizedstandardized version where international [[character (computing)|character]]s were given reasonably complete treatment. When an HTML document includes special characters outside the range of seven-bit [[ASCII]], two goals are worth considering: the information's [[integrity]], and universal [[Web browser|browser]] display.
 
==Specifying the document's character encoding==
Line 118:
 
==Character references==
{{Main|CharacterList of XML and HTML character entity referencereferences|Numeric character reference}}
 
In addition to native character encodings, characters can also be encoded as ''character references'', which can be ''numeric character references'' ([[decimal]] or [[hexadecimal]]) or ''character entity references''. Character entity references are also sometimes referred to as ''named entities'', or ''HTML entities'' for HTML. HTML's usage of character references derives from [[SGML]].
Line 124:
===HTML character references===
<!--Linked from [[Template:Auxiliary template common notice]]-->
A ''[[numeric character reference]]'' in HTML refers to a character by its [[Universal Character Set]]/[[Unicode]] ''[[code point]]'', and uses the format
 
:<code>&#''nnnn'';</code>
Line 136:
For codes from 0 to 127, the original 7-bit [[ASCII]] standard set, most of these characters can be used without a character reference. Codes from 160 to 255 can all be created using [[List of XML and HTML character entity references|character entity names]]. Only a few higher-numbered codes can be created using entity names, but all can be created by decimal number character reference.
 
[[List of XML and HTML character entity references|Character entity references]] can also have the format <code>&amp;''name'';</code> where ''name'' is a case-sensitive alphanumeric string. For example, "λ" can also be encoded as <code>&amp;lambda;</code> in an HTML document. The character entity references <code>&amp;lt;</code>, <code>&amp;gt;</code>, <code>&amp;quot;</code> and <code>&amp;amp;</code> are predefined in HTML and SGML, because <code>&lt;</code>, <code>&gt;</code>, <code>"</code> and <code>&amp;</code> are already used to delimit markup. This notably did not include XML's <code>&amp;apos;</code> (') entity prior to [[HTML5]]. For a list of all named HTML character entity references along with the versions in which they were introduced, see [[List of XML and HTML character entity references]].
 
Unnecessary use of HTML character references may significantly reduce HTML readability. If the character encoding for a web page is chosen appropriately, then HTML character references are usually only required for markup delimiting characters as mentioned above, and for a few special characters (or none at all if a native [[Unicode]] encoding like [[UTF-8]] is used). Incorrect HTML entity escaping may also open up security vulnerabilities for injection attacks such as [[cross-site scripting]]. If HTML attributes are left unquoted, certain characters, most importantly [[whitespace character|whitespace]], such as space and tab, must be escaped using entities. Other languages related to HTML have their own methods of escaping characters.
Line 167:
 
== External links ==
* [https://owuk.com/html-encode.html Online HTML entity encoder & decoder tool]
* [http://www.w3.org/TR/REC-html40/sgml/entities.html Character entity references in HTML4]
* [http://www.sitepoint.com/article/guide-web-character-encoding/ The Definitive Guide to Web Character Encoding]