Comparison of Unicode encodings: Difference between revisions

Content deleted Content added
Reisio (talk | contribs)
m Seven-bit environments: cleanup markup
m Spelling, punctuation, capitals
Line 1:
This page compares Unicode encodings. Two situations are considered: eight-bit-clean environments and environments like [[Simple Mail Transfer Protocol]] that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that only used 7only seven data bits, but they remain in the standards and so software must generate messages that comply with the restrictions. [[Standard Compression Scheme for Unicode]] and [[Binary Ordered Compression for Unicode]] are excluded from the comparison tables because it is difficult to simply quantify their size.
 
==Summary of size issues==
Line 6:
==Considerations other than size==
===For processing===
For processing a format should be easy to search, truncate, and generally process safely. Fixed -size characters can be helpfullhelpful, but it should be remembered that even if there is a fixed width per code point (as in utfUTF-32), there is not a fixed width per displayed character due to [[combining character]]s. Also if you are working with a particular API[[application programming interface]] heavillyheavily and that apiAPI has standardised on a particular unicodeUnicode encoding it is generally a good idea to use the encoding that the API does. UTF-16 is popular because many apisAPIs date to the time when unicodeUnicode was 16 -bit fixed width. Unfortunately using UTF-16 encourangesencourages code that does not properly handle code points outside the BMP.
 
===For communication===
Some protocols may limitbe youlimited to a specific set of encodings, but even when they don'tare not some encodings may offer better compatibility than others with existing implementations. Also the cost of converting between your processing format and your communication format should be considered both in terms of program size (e.g. GB18030 requires a huge mapping table) and run-time requirements. It may simplify matters to use the same format for processing that you are communicating in, especially for servers.
 
==In detail==