Character encodings in HTML: Difference between revisions

Content deleted Content added
mNo edit summary
mNo edit summary
Line 1:
{{shortShort description|Use of encoding systems for international characters in HTML}}
{{forFor|a list of character entity references|List of XML and HTML character entity references}}
{{Hatnote|For fixing links within Wikipedia, see [[Help:Percent-encoding#Fixing links with unsupported characters|Help:Percent-encoding § Fixing Links with Unsupported Characters]].}}
{{Use dmy dates|date=December 2021}}
Line 65:
* [[Windows-1257]]
* [[Windows-1258]]
* [[GB18030GB 18030]]{{efn|Specified with 0xA3A0 as a duplicate encoding of the [[ideographic space]] (U+3000) for compatibility reasons, and as such excluding U+E5E5 (a private use character).<ref name="gbenc"/><ref name="gbindex"/> Also, specified with 0x80 accepted as an alternative encoding of the [[euro sign]] (U+20AC; see [[Windows-936]]).<ref>{{cite web |url=https://encoding.spec.whatwg.org/#gb18030-decoder |title=10.2.1. gb18030 decoder |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> Otherwise, follows the mappings from the 2005 standard.<ref name="gbindex">{{cite web |url=https://encoding.spec.whatwg.org/#index-gb18030 |title=5. Indexes (§ index gb18030) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[Big5]]{{efn|[[Hong Kong Supplementary Character Set]] variant, although most of the HKSCS extensions (those with lead bytes less than 0xA1) are not included by the encoder, only by the decoder.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-big5-pointer |title=5. Indexes (§ index Big5 pointer) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[Shift JIS]]{{efn|The specification includes [[IBM]] and [[NEC]] extensions (see [[Windows-31J]]).<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-jis0208 |title=5. Indexes (§ Index jis0208) |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
Line 93:
* [[Windows-1253]]
* [[Mac OS Cyrillic encoding|Mac OS Cyrillic]]
* [[GBK (character encoding)|GBK]]{{efn|Also specified for <code>[[GB2312GB 2312]]</code> and related labels. Handled the same as GB18030{{nowrap|GB 18030}} for decoding purposes.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#gbk |title=10.1. GBK |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> For encoding purposes, labelling as GBK (or GB2312{{nowrap|GB 2312}}) excludes four-byte codes, and favours the one-byte 0x80 representation for U+20AC.<ref name="gbenc">{{cite web |url=https://encoding.spec.whatwg.org/#gb18030-encoder |title=10.2.2. gb18030 encoder |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[EUC-JP]]{{efn|The specification uses the same index as used for Shift JIS (insofar as is within reach of the EUC code set 1), i.e. includes NEC extensions. [[JIS X 0212]] is included for decoding only.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-jis0212 |title=5. Indexes (§ Index jis0212) |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
}}{{notelist}}