Content deleted Content added
m wikify |
|||
Line 6:
==Considerations other than size==
===For processing===
For processing a format should be easy to search, truncate, and generally process safely.
Fixed-size characters can be helpful, but it should be remembered that even if there is a fixed width per code point (as in UTF-32), there is not a fixed width per displayed character due to [[combining character]]s. If you are working with a particular [[application programming interface|API]] heavily and that API has standardised on a particular Unicode encoding it is generally a good idea to use the encoding that the API does to avoid the need to convert before every call to the API. Simerally if you are writing a network daemon it may simplify matters to use the same format for processing that you are communicating in.
UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width. Unfortunately using UTF-16 makes characters outside the BMP a special case which increases the risk of oversights related to their handling.
===For communication===
Some protocols may be limited to a specific set of encodings, but even when they are not some encodings may offer better compatibility than others with existing implementations. Also the cost of converting between your processing format and your communication format should be considered both in terms of program size (e.g. GB18030 requires a huge mapping table) and run-time requirements.
==In detail==
|