Revision as of 13:38, 8 July 2021 edit Whoop whoop pull up (talk \| contribs) Extended confirmed users 35,260 edits No edit summary ← Previous edit		Revision as of 15:02, 8 July 2021 edit undo Whoop whoop pull up (talk \| contribs) Extended confirmed users 35,260 edits →For communication and storage Next edit →
Line 42: UTF-16 and UTF-32 do not have [[endianness]] defined, so a byte order must be selected when receiving them over a byte-oriented network or reading them from a byte-oriented storage. This may be achieved by using a [[byte-order mark]] at the start of the text or assuming big-endian (RFC 2781). [[UTF-8]], [[UTF-16BE]], [[UTF-32BE]], [[UTF-16LE]] and [[UTF-32LE]] are standardised on a single byte order and do not have this problem. If the byte stream is subject to [[data corruption\|corruption]] then some encodings recover better than others. UTF-8 and UTF-EBCDIC are best in this regard as they can always resynchronize after a corrupt or missing byte at the start of the next code point; GB 18030 is unable to recover until the next ASCII non-number. UTF-16 can handle ''altered'' bytes, but not an odd number of ''missing'' bytes, which will garble all the following text (though it will produce uncommon and/or unassigned characters).{{efn\|An ''even'' number of missing bytes in UTF-16, in contrast, will garble at most one character.}} If ''bits'' can be lost all of them will garble the following text, though UTF-8 can be resynchronized as incorrect byte boundaries will produce invalid UTF-8 in almost all text longer than a few bytes. == In detail ==

Comparison of Unicode encodings: Difference between revisions