Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
Line 3:
 
==Summary of size issues==
[[UTF-32]] requires four bytes to encode any character. Since characters outside the [[basic multilingual plane]] are rare, a document encoded in UTF-32 will usually be nearly twice as large as its [[UTF-16]]–encoded equivalent. On the other hand, [[UTF-8]] uses anywhere between one and four bytes to encode a character; it may use fewer, the same, or more bytes than UTF-16 to encode the same character. [[UTF-EBCDIC]] is always as bad as or worse than [[UTF-8]] for printable characters due to a descisiondecision made to allow encoding the C1 control codes as single bytes.
 
For seven-bit environments, UTF-7 clearly wins over the combination of other Unicode encodings with [[quoted printable]] or [[base64]]. <!--For eight-bit-clean environments things vary considerably depending on what code points are in the text.-->