Revision as of 03:10, 11 April 2022 edit Ser Amantio di Nicolao (talk \| contribs) Autopatrolled, Administrators 6,590,652 edits →top: add short description Tag: AWB ← Previous edit		Revision as of 19:10, 11 April 2022 edit undo 76.9.249.62 (talk) According to the UTF-8 page, characters are encoded in as many as 6 bytes, not the 4 bytes listed in this article. Tag: Reverted Next edit →
Line 20: == Efficiency == [[UTF-8]] requires 8, 16, 24, 32, 40, or 3248 bits (one to ~~four~~six [[Octet (computing)\|bytes]]) to encode a Unicode character, [[UTF-16]] requires either 16 or 32 bits to encode a character, and [[UTF-32]] always requires 32 bits to encode a character. The first 128 Unicode [[code point]]s, U+0000 to U+007F, used for the [[C0 Controls and Basic Latin]] characters and which correspond one-to-one to their ASCII-code equivalents, are encoded using 8 bits in UTF-8, 16 bits in UTF-16, and 32 bits in UTF-32. The next 1,920 characters, U+0080 to U+07FF (encompassing the remainder of almost all [[Latin-script alphabet]]s, and also [[Greek alphabet\|Greek]], [[Cyrillic script\|Cyrillic]], [[Coptic alphabet\|Coptic]], [[Armenian alphabet\|Armenian]], [[Hebrew alphabet\|Hebrew]], [[Arabic alphabet\|Arabic]], [[Syriac alphabet\|Syriac]], [[Tāna]] and [[N'Ko alphabet\|N'Ko]]), require 16 bits to encode in both UTF-8 and UTF-16, and 32 bits in UTF-32. For U+0800 to U+FFFF, i.e. the remainder of the characters in the [[Basic Multilingual Plane]] (BMP, plane 0, U+0000 to U+FFFF), which encompasses the rest of the characters of most of the world's living languages, UTF-8 needs 24 bits to encode a character, while UTF-16 needs 16 bits and UTF-32 needs 32. Code points U+010000 to U+10FFFF, which represent characters in the [[Plane (Unicode)\|supplementary planes]] (planes 1–16), require 32 bits in UTF-8, UTF-16 and UTF-32.

Comparison of Unicode encodings: Difference between revisions