Revision as of 15:12, 6 August 2012 edit Funandtrvl (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers, Template editors 205,053 edits mos:layout ← Previous edit		Revision as of 14:48, 15 January 2013 edit undo 145.226.30.45 (talk) →HTML document characters Next edit →
Line 11: Web pages are typically [[HTML]] or [[XHTML]] documents. Both types of documents consist, at a fundamental level, of [[character (computing)\|character]]s, which are [[grapheme]]s and grapheme-like units, independent of how they manifest in [[computer storage]] systems and [[computer network\|network]]s. An HTML document is a sequence of Unicode characters. More specifically, HTML 4.0 documents are required to consist of characters in the HTML ''document character set'' : a character repertoire wherein each character is assigned a unique, non-negative integer ''code point''. This set is defined in the HTML 4.0 [[Document Type Definition\|DTD]], which also establishes the syntax (allowable sequences of characters) that can produce a valid HTML document. The HTML document character set for HTML 4.0 consists of most, but not all, of the characters jointly defined by [[Unicode]] and ISO/IEC 10646: the [[Universal Character Set]] (UCS). Like HTML documents, an XHTML document is a sequence of Unicode characters. However, an XHTML document is an [[XML]] document, which, while not having an explicit "document character" layer of [[abstraction]], nevertheless relies upon a similar definition of permissible characters that cover most, but not all, of the Unicode/UCS character definitions. The sets used by HTML and XHTML/XML are slightly different, but these differences have little effect on the average document author.

Unicode and HTML: Difference between revisions