Unicode and HTML: Difference between revisions

Content deleted Content added
m Typo patrol, typos fixed: agianst → against, documetns → documents, the the → the using AWB (7794)
m HTML document characters: template for better display
Line 13:
Like HTML documents, an XHTML document is a sequence of Unicode characters. However, an XHTML document is an [[XML]] document, which, while not having an explicit "document character" layer of [[abstraction]], nevertheless relies upon a similar definition of permissible characters that cover most, but not all, of the Unicode/UCS character definitions. The sets used by HTML and XHTML/XML are slightly different, but these differences have little effect on the average document author.
 
Regardless of whether the document is HTML or XHTML, when stored on a [[file system]] or transmitted over a network, the document's characters are ''encoded'' as a sequence of [[bit]] [[octet (computing)|octet]]s (''[[byte]]s'') according to a particular character encoding. This encoding may either be a [[Unicode Transformation Format]], like [[UTF-8]], that can directly encode any Unicode character, or a legacy encoding, like [[Windows-1252]], that cannot. However, even when using encodings that do not support all Unicode characters, the encoded document may make use of [[numeric character references]]. For example <code>&amp;#x263A;</code> ({{unicode|}}) is used to indicate a smiling face character in the Unicode character set.
 
=== Character encoding===