Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
No edit summary
Line 42:
UTF-16 and UTF-32 do not have [[endianness]] defined, so a byte order must be selected when receiving them over a byte-oriented network or reading them from a byte-oriented storage. This may be achieved by using a [[byte-order mark]] at the start of the text or assuming big-endian (RFC 2781). [[UTF-8]], [[UTF-16BE]], [[UTF-32BE]], [[UTF-16LE]] and [[UTF-32LE]] are standardised on a single byte order and do not have this problem.
 
If the byte stream is subject to [[data corruption|corruption]] then some encodings recover better than others. UTF-8 and UTF-EBCDIC are best in this regard as they can always resynchronize after a corrupt or missing byte at the start of the next code point; GB 18030 is unable to recover until the next ASCII non-number. UTF-16 can handle ''altered'' bytes, but not an odd number of ''missing'' bytes, which will garble all the following text (though it will produce uncommon and/or unassigned characters).{{efn|An ''even'' number of missing bytes in UTF-16, in contrast, will garble at most one character.}} If ''bits'' can be lost all of them will garble the following text, though UTF-8 can be resynchronized as incorrect byte boundaries will produce invalid UTF-8 in almost all text longer than a few bytes.
 
== In detail ==