Content deleted Content added
BabelStone (talk | contribs) Clarify that UTF-16 (but not UTF-32) can resync on a surrogate pair (but if there are no surrogate pairs in the text stream it is true that the text will be garbled because there is nothing to sync on) |
→For communication and storage: Surrogate pairs have nothing to do with this, the problem is 1/2 of a code point is missing |
||
Line 36:
UTF-16 and UTF-32 do not have [[endianness]] defined, so a byte order must be selected when receiving them over a byte-oriented network or reading them from a byte-oriented storage. This may be achieved by using a [[byte-order mark]] at the start of the text or assuming big-endian (RFC 2781). [[UTF-8]], [[UTF-16BE]], [[UTF-32BE]], [[UTF-16LE]] and [[UTF-32LE]] are standardised on a single byte order and do not have this problem.
If the byte stream is subject to [[data corruption|corruption]] then some encodings recover better than others. UTF-8 and UTF-EBCDIC are best in this regard as they can always resynchronize after a corrupt or missing byte at the start of the next code point; GB 18030 is unable to recover
== In detail ==
|