Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
No edit summary
Reisio (talk | contribs)
m Seven-bit environments: cleanup markup
Line 30:
This table may not cover every special case and so should be used for estimation and comparison only. To accurately determine the size of text in an encoding, see the actual specifications.
{| {{prettytable}}
|code range (hexadecimal)|
|[[UTF-7]]|
|[[UTF-8]] [[quoted printable]]|
|UTF-8 [[base64]]|
|[[UTF-16]] quoted printable|
|UTF-16 base64|
|[[UTF-32]] quoted printable|
|UTF-32 base64|
|[[GB18030]] quoted printable|
|[[GB18030]] base64
|-
|000000 - 000032|
|same as 000080-00FFFFFF|
|3|
|1⅓|
|6|
|2⅔|
|12|
|5⅓|
|3|
|1⅓
|-
|000033 - 00003C
|000033 - 00003C||rowspan=3|1 for "direct characters" and possibly "optional direct characters" (depending on the encoder setting) 2 for +, otherwise same as 000080-00FFFFFF||1||1⅓||4||2⅔||10||5⅓||1||1⅓
|1
|1⅓
|4
|2⅔
|10
|5⅓
|1
|1⅓
|-
|00003D (equals sign)|
|3|
|1⅓|
|6|
|2⅔|
|12|
|5⅓|
|3|
|1⅓
|-
|00003E - 00007E|
|1|
|1⅓|
|4|
|2⅔|
|10|
|5⅓|
|1|
|1⅓
|-
|00007F
|00007F||rowspan=3|5 for an isolated case inside a run of single byte characters. For runs 2⅔ per character plus padding to make it a whole number of bytes plus two to start and finish the run||3||1⅓||6||2⅔||12||5⅓||3||1⅓
|3
|1⅓
|6
|2⅔
|12
|5⅓
|3
|1⅓
|-
|000080 - 0007FF
|000080 - 0007FF||6||2&#x2154;||rowspan=2|2-6 depending on if the byte values need to be escaped||2⅔||rowspan=3|8-12 depending on if the final two byte values need to be escaped||5⅓||rowspan=2|4-6 for stuff inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese stuff) 6-10 for everything else.||rowspan=2|2&#x2154; for stuff inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese stuff) 5⅓ for everything else.
|6
|2&#x2154;
|rowspan=2|2-6 depending on if the byte values need to be escaped
|2⅔
|rowspan=3|8-12 depending on if the final two byte values need to be escaped
|5⅓
|rowspan=2|4-6 for stuff inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese stuff) 6-10 for everything else.
|rowspan=2|2&#x2154; for stuff inherited from [[GB2312]]/[[GBK]] (e.g.<br>most Chinese stuff) 5⅓ for everything else.
|-
|000800 - 00FFFF|
|9|
|4|
|2⅔|
|5⅓
|-
|010000 - 10FFFF|
|same as two characters from above|
|12|
|5⅓|
|8-12 depending on if the low bytes of the surrogates need to be escaped.|
|5⅓|
|5⅓|
|6-10|
|5⅓
|}