Character encodings in HTML: Difference between revisions

Content deleted Content added
mNo edit summary
Line 68:
* [[Windows-1258]]
* [[GB 18030]]{{efn|Specified with 0xA3A0 as a duplicate encoding of the [[ideographic space]] (U+3000) for compatibility reasons, and as such excluding U+E5E5 (a private use character).<ref name="gbenc"/><ref name="gbindex"/> Also, specified with 0x80 accepted as an alternative encoding of the [[euro sign]] (U+20AC; see [[Windows-936]]).<ref>{{cite web |url=https://encoding.spec.whatwg.org/#gb18030-decoder |title=10.2.1. gb18030 decoder |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> Otherwise, follows the mappings from the 2005 standard.<ref name="gbindex">{{cite web |url=https://encoding.spec.whatwg.org/#index-gb18030 |title=5. Indexes (§ index gb18030) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[Big5]]{{efn|[[Hong Kong Supplementary Character Set]] variant,<ref name="encoding_rs"/> although most of the HKSCS extensions (those with lead bytes less than 0xA1) are not included by the encoder, only by the decoder.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-big5-pointer |title=5. Indexes (§ index Big5 pointer) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[Shift JIS]]{{efn|The specification includes [[IBM]] and [[NEC]] extensions (see [[Windows-31J]]).,<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-jis0208 |title=5. Indexes (§ Index jis0208) |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> and is more precisely [[Windows-31J]].<ref name="encoding_rs">{{cite web |url=https://docs.rs/encoding_rs/latest/encoding_rs/#notable-differences-from-iana-naming |title=Notable Differences from IANA Naming |work=Crate encoding_rs |publisher=docs.rs}}</ref>}}
* [[ISO-2022-JP]]{{efn|The specification uses the same index as used for Shift JIS (insofar as is within reach), i.e. includes NEC extensions. [[Half-width kana]] is converted to fullwidth by the encoder,<ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-iso-2022-jp-katakana |title=5. Indexes (§ Index ISO-2022-JP katakana) |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> but accepted using an escape sequence (ESC 0x28 0x49) by the decoder.<ref name="whatwgjisdecoder">{{cite web |url=https://encoding.spec.whatwg.org/#iso-2022-jp-decoder |title=12.2.1. ISO-2022-JP decoder |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> [[Shift Out]] and [[Shift In]] (0x0E and 0x0F) are excluded entirely to prevent attacks.<ref name="whatwgjisdecoder" /><ref>{{cite web |url=https://encoding.spec.whatwg.org/#iso-2022-jp-encoder |title=12.2.2. ISO-2022-JP encoder |institution=[[WHATWG]] |work=Encoding Standard |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[EUC-KR]]{{efn|Actually [[Unified Hangul Code]] (Windows-949), which is a superset which covers the entire [[Hangul Syllables (Unicode block)|Hangul Syllables]] block.<ref name="encoding_rs"/><ref>{{cite web |url=https://encoding.spec.whatwg.org/#index-euc-kr |title=5. Indexes (§ index EUC-KR) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[UTF-16BE]]{{efn|Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in [[UTF-8]].<ref name="outputenc">{{cite web |url=https://encoding.spec.whatwg.org/#output-encodings |title=4.3. Output encodings |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref>}}
* [[UTF-16LE]]{{efn|For compatibility with deployed content, also specified for the plain <code>[[UTF-16]]</code> label,<ref>{{cite web |url=https://encoding.spec.whatwg.org/#utf-16le |title=14.4. UTF-16LE |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> although a [[byte order mark]] (BOM), if present, takes priority over any label.<ref>{{cite web |url=https://encoding.spec.whatwg.org/#decode |title=6. Hooks for standards (§ decode) |work=Encoding Standard |institution=[[WHATWG]] |last=van Kesteren |first=Anne |author-link=Anne van Kesteren}}</ref> Specified for decoding only; form submissions from UTF-16-coded documents are to be encoded in [[UTF-8]].<ref name="outputenc" />}}