Revision as of 19:47, 16 January 2006 edit Kbolino (talk \| contribs) Extended confirmed users 2,000 edits m →Summary of size issues: ln UTF-32, UTF-16, UTF-8 ← Previous edit		Revision as of 19:50, 16 January 2006 edit undo Kbolino (talk \| contribs) Extended confirmed users 2,000 edits m →Summary of size issues: correction on UTF-8 (self) Next edit →
Line 3: ==Summary of size issues== [[UTF-32]] requires four bytes to encode any character. Since characters outside the [[basic multilingual plane]] are rare, a document encoded in UTF-32 will usually be nearly twice as large as its [[UTF-16]]–encoded equivalent. On the other hand, [[UTF-8]] uses anywhere between one and four bytes to encode a character; it ~~will~~may use asfewer, ~~many~~the same, or ~~fewer~~more bytes than UTF-16 to encode the same character in all cases. For seven-bit environments, UTF-7 clearly wins over the combination of other Unicode encodings with [[quoted printable]] or [[base64]]. <!--For eight-bit-clean environments things vary considerably depending on what code points are in the text.-->

Comparison of Unicode encodings: Difference between revisions