Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
Bluebot (talk | contribs)
m Robot subst'ing.
m – for ranges
Line 23:
! Code range (hexadecimal) !! [[UTF-8]] !! [[UTF-16]] !! [[UTF-32]] !! [[UTF-EBCDIC]] !! [[GB18030]]
|-
|000000 -– 00007F||1||2||4||1||1
|-
|000080 -&ndash; 00009F||2||2||4||1||rowspan=5|2 for characters inherited from<br>[[GB2312]]/[[GBK]] (e.g. most<br>Chinese characters) 4 for<br>everything else.
|-
|0000A0 -&ndash; 0003FF||2||2||4||2
|-
|000400 -&ndash; 0007FF||2||2||4||3
|-
|000800 -&ndash; 003FFF||3||2||4||3
|-
|004000 -&ndash; 00FFFF||3||2||4||4
|-
|010000 -&ndash; 03FFFF||4||4||4||4||4
|-
|040000 -&ndash; 10FFFF||4||4||4||5||4
|}
 
Line 54:
|[[GB18030]] base64
|-
|000000 -&ndash; 000032
|same as 000080-&ndash;00FFFFFF
|3
|1&#x2153;
Line 65:
|1&#x2153;
|-
|000033 -&ndash; 00003C
|rowspan=3|1 for "direct characters" and possibly "optional direct characters" (depending on the encoder setting) 2 for +, otherwise same as 000080-&ndash;00FFFFFF
|1
|1&#x2153;
Line 86:
|1&#x2153;
|-
|00003E -&ndash; 00007E
|1
|1&#x2153;
Line 107:
|1&#x2153;
|-
|000080 -&ndash; 0007FF
|6
|2&#x2154;
|rowspan=2|2-&ndash;6 depending on if the byte values need to be escaped
|2⅔
|rowspan=3|8-&ndash;12 depending on if the final two byte values need to be escaped
|5⅓
|rowspan=2|4-&ndash;6 for characters inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese characters) 6-&ndash;10 for everything else.
|rowspan=2|2&#x2154; for characters inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese characters) 5⅓ for everything else.
|-
|000800 -&ndash; 00FFFF
|9
|4
Line 123:
|5⅓
|-
|010000 -&ndash; 10FFFF
|same as two characters from above
|12
|5⅓
|8-&ndash;12 depending on if the low bytes of the surrogates need to be escaped.
|5⅓
|5⅓
|6-&ndash;10
|5⅓
|}