Content deleted Content added
Andrius2507 (talk | contribs) m →Storage utilization: add source of of previously unsourced claim |
|||
Line 27:
===Storage utilization===
Each format has its own set of advantages and disadvantages with respect to storage efficiency (and thus also of transmission time) and processing efficiency. Storage efficiency is subject to the ___location within the Unicode [[code point|code space]] in which any given text's characters are predominantly from. Since Unicode code space blocks are organized by character set (i.e. alphabet/script), storage efficiency of any given text effectively depends on the [[alphabet|alphabet/script]] used for that text. So, for example, UTF-8 needs one less byte per character (8 versus 16 bits) than UTF-16 for the 128 code points between U+0000 and U+007F, but needs one more byte per character (24 versus 16 bits) for the 63,488 code points between U+0800 and U+FFFF. Therefore, if there are more characters in the range U+0000 to U+007F than there are in the range U+0800 to U+FFFF then UTF-8 is more efficient, while if there are fewer, then UTF-16 is more efficient. If the counts are equal then they are exactly the same size. A surprising result is that real-world documents written in languages that use characters only in the high range are still often shorter in UTF-8, due to the extensive use of spaces, digits, punctuation, newlines, html markup, and embedded words and acronyms written with Latin letters.<ref>{{
===Processing time===
|