Content deleted Content added
→top: add short description |
According to the UTF-8 page, characters are encoded in as many as 6 bytes, not the 4 bytes listed in this article. Tag: Reverted |
||
Line 20:
== Efficiency ==
[[UTF-8]] requires 8, 16, 24, 32, 40, or
The next 1,920 characters, U+0080 to U+07FF (encompassing the remainder of almost all [[Latin-script alphabet]]s, and also [[Greek alphabet|Greek]], [[Cyrillic script|Cyrillic]], [[Coptic alphabet|Coptic]], [[Armenian alphabet|Armenian]], [[Hebrew alphabet|Hebrew]], [[Arabic alphabet|Arabic]], [[Syriac alphabet|Syriac]], [[Tāna]] and [[N'Ko alphabet|N'Ko]]), require 16 bits to encode in both UTF-8 and UTF-16, and 32 bits in UTF-32. For U+0800 to U+FFFF, i.e. the remainder of the characters in the [[Basic Multilingual Plane]] (BMP, plane 0, U+0000 to U+FFFF), which encompasses the rest of the characters of most of the world's living languages, UTF-8 needs 24 bits to encode a character, while UTF-16 needs 16 bits and UTF-32 needs 32. Code points U+010000 to U+10FFFF, which represent characters in the [[Plane (Unicode)|supplementary planes]] (planes 1–16), require 32 bits in UTF-8, UTF-16 and UTF-32.
|