This page compares unicode encodings. Two situations are considered: 8 bit clean environments and environments like SMTP that only support 7 bit characters. SCSU and BOCU are excluded from the comparison tables because it is difficult to simply quantify thier size.
In summary
For 7 bit environments UTF-7 clearly wins over the combination of other unicode encodings with quoted printable or base64. For 8 bit clean environments things vary considerablly depending on what code points are in the text to be encoded.
In detail
The tables below list the number of bytes per code point for different unicode ranges. Any additonal comments needed are included in the table. The figures assume that overheads at the start and end of the block of text are negligable.
8 bit environments
code range (hexadecimal) | UTF-8 | UTF-16 | UTF-32 | GB18030 |
000000 - 00007F | 1 | 2 | 4 | 1 |
000080 - 0007FF | 2 | 2 | 4 | |
000800 - 00FFFF | 3 | 2 | 4 | |
010000 - 10FFFF | 4 | 4 | 4 |