Revision as of 12:24, 4 May 2023 edit 146.198.12.80 (talk) Add clarify template. ← Previous edit		Revision as of 15:07, 4 May 2023 edit undo Spitzak (talk \| contribs) Extended confirmed users 10,500 edits →Processing time Next edit →
Line 29: ===Processing time=== Text with variable-length encoding such as UTF-8 or UTF-16 is harder to process if there is a need to work with individual code units, as opposed to working with sequences of code units. Searching is unaffected by whether the characters are variable sized, since a search for a sequence of code units does not care about the divisions (it does require that the encoding be [[self-synchronizing~~,{{clarify\|date=May~~ ~~2023}}~~code\|self-synchronizing]], which both UTF-8 and UTF-16 are). A common misconception is that there is a need to "find the ''n''th character" and that this requires a fixed-length encoding; however, in real use the number ''n'' is only derived from examining the {{nowrap\|''n−1''}} characters, thus sequential access is needed anyway.{{Citation needed\|date=October 2013}} When character sequences in one endian order are loaded onto a machine with a different endian order, the characters need to be converted before they can be processed efficiently (or two processors are needed). Byte-based encodings such as UTF-8 do not have this problem. [[UTF-16BE]] and [[UTF-32BE]] are [[endianness\|big-endian]], [[UTF-16LE]] and [[UTF-32LE]] are [[endianness\|little-endian]].

Comparison of Unicode encodings: Difference between revisions