Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
m Reverted edits by 109.161.218.178 (talk) to last version by Spitzak
Line 29:
 
===Processing time===
Text with variable-length encoding such as UTF-8 or UTF-16 is harder to process if there is a need to work with individual code units, as opposed to working with sequences of code units. Searching is unaffected by whether the characters are variable sized, since a search for a sequence of code units does not care about the divisions (it does require that the encoding be self-synchronizing, which both UTF-8 and UTF-16 are). A common misconception is that there is a need to "find the ''n''th character" and that this requires a fixed-length encoding; however, in real use the number ''n'' is only derived from examining the {{nowrap|''n−1''}} characters, thus sequential access is needed anyway.{{Citation needed|date=October 2013}} [[UTF-16BE]] and [[UTF-32BE]] are [[endianness|big-endian]], [[UTF-16LE]] and [[UTF-32LE]] are [[endianness|little-endian]].
 
When character sequences in one endian order are loaded onto a machine with a different endian order, the characters need to be converted before they can be processed efficiently (or two processors are needed). Byte-based encodings such as UTF-8 do not have this problem. [[UTF-16BE]] and [[UTF-32BE]] are [[endianness|big-endian]], [[UTF-16LE]] and [[UTF-32LE]] are [[endianness|little-endian]].
 
== Processing issues ==