Revision as of 13:56, 26 May 2022 edit Dimitriy Ryazantcev (talk \| contribs) 149 edits →Processing issues ← Previous edit		Revision as of 12:45, 28 August 2022 edit undo Andrius2507 (talk \| contribs) 52 edits m →Storage utilization: add source of of previously unsourced claim Tag: Visual edit Next edit →
Line 27: ===Storage utilization=== Each format has its own set of advantages and disadvantages with respect to storage efficiency (and thus also of transmission time) and processing efficiency. Storage efficiency is subject to the ___location within the Unicode [[code point\|code space]] in which any given text's characters are predominantly from. Since Unicode code space blocks are organized by character set (i.e. alphabet/script), storage efficiency of any given text effectively depends on the [[alphabet\|alphabet/script]] used for that text. So, for example, UTF-8 needs one less byte per character (8 versus 16 bits) than UTF-16 for the 128 code points between U+0000 and U+007F, but needs one more byte per character (24 versus 16 bits) for the 63,488 code points between U+0800 and U+FFFF. Therefore, if there are more characters in the range U+0000 to U+007F than there are in the range U+0800 to U+FFFF then UTF-8 is more efficient, while if there are fewer, then UTF-16 is more efficient. If the counts are equal then they are exactly the same size. A surprising result is that real-world documents written in languages that use characters only in the high range are still often shorter in UTF-8, due to the extensive use of spaces, digits, punctuation, newlines, html markup, and embedded words and acronyms written with Latin letters.<ref>{{~~Citation~~Cite web ~~needed~~\|title=UTF-8 Everywhere \|url=https://utf8everywhere.org/#asian \|access-date=~~October~~2022-08-28 ~~2013~~\|website=utf8everywhere.org}}</ref> ===Processing time===

Comparison of Unicode encodings: Difference between revisions