Revision as of 15:02, 8 July 2021 edit Whoop whoop pull up (talk \| contribs) Extended confirmed users 35,260 edits →For communication and storage ← Previous edit		Revision as of 22:38, 19 November 2021 edit undo Dewritech (talk \| contribs) Extended confirmed users, New page reviewers, Rollbackers 175,278 edits m →Efficiency: clean up, typo(s) fixed: 1-16 → 1–16 Tag: AWB Next edit →
Line 18: == Efficiency == [[UTF-8]] requires 8, 16, 24 or 32 bits (one to four [[Octet (computing)\|bytes]]) to encode a Unicode character, [[UTF-16]] requires either 16 or 32 bits to encode a character, and [[UTF-32]] always requires 32 bits to encode a character. The first 128 Unicode [[code point]]s, U+0000 to U+007F, used for the [[C0 Controls and Basic Latin]] characters and which correspond one-to-one to their ASCII-code equivalents, are encoded using 8 bits in UTF-8, 16 bits in UTF-16, and 32 bits in UTF-32. The next 1,920 characters, U+0080 to U+07FF (encompassing the remainder of almost all [[Latin-script alphabet]]s, and also [[Greek alphabet\|Greek]], [[Cyrillic script\|Cyrillic]], [[Coptic alphabet\|Coptic]], [[Armenian alphabet\|Armenian]], [[Hebrew alphabet\|Hebrew]], [[Arabic alphabet\|Arabic]], [[Syriac alphabet\|Syriac]], [[Tāna]] and [[N'Ko alphabet\|N'Ko]]), require 16 bits to encode in both UTF-8 and UTF-16, and 32 bits in UTF-32. For U+0800 to U+FFFF, i.e. the remainder of the characters in the [[Basic Multilingual Plane]] (BMP, plane 0, U+0000 to U+FFFF), which encompasses the rest of the characters of most of the world's living languages, UTF-8 needs 24 bits to encode a character, while UTF-16 needs 16 bits and UTF-32 needs 32. Code points U+010000 to U+10FFFF, which represent characters in the [[Plane (Unicode)\|supplementary planes]] (planes ~~1-16~~1–16), require 32 bits in UTF-8, UTF-16 and UTF-32. All printable characters in [[UTF-EBCDIC]] use at least as many bytes as in UTF-8, and most use more, due to a decision made to allow encoding the C1 control codes as single bytes. For seven-bit environments, [[UTF-7]] is more space efficient than the combination of other Unicode encodings with [[quoted-printable]] or [[base64]] for almost all types of text (see "[[#Seven-bit environments\|Seven-bit environments]]" below).

Comparison of Unicode encodings: Difference between revisions